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CHAPTER  1.    INTRODUCTION 


Computers  are  often  considered  extensions  of  the  mind  in 
the  same  sense  that  manual  tools  are  extensions  of  the  hand. 
In  this  machine-man  analogy,  physical  measuring  devices 
(photocells,  pressure  sensitive  switches,  etc.)  are  the 
sensory  cells  of  organs  of  sensation  and  the  computer  receiv- 
ing messages  from  them  functions  as  an  organ  of  perception. 
In  many  applications  of  the  analogy,  e.g.   process  control, 
the  computer  exhibits  mere  sensation  and  reflexes.   In 
artificial  intelligence  (henceforth  AI) ,  however,  computer 
systems  are  designed  expressly  to  exhibit  perception,  comprehen- 
sion, curiosity,  and  intention.   Though  rather  anthropomorphic, 
this  aim  is  a  primary  motivation  for  most  AI  researchers  and 
raises  many  important  philosophical  and  psychological  questions 
whose  discussion  has  already  changed  our  way  of  looking  at 
human  intelligence.   Naive  popular  interpretations  of  AI 
include  conjurations  of  evil  or  benevolent  robots  or,  no  less 
anthropomorphic,  the  amoral  oracle  which  answers  questions 
truthfully  but  refuses  ("stubbornly,  proudly")  to  consider 
the  good  or  evil  consequences.   The  escalating  impact  of 
computers  on  society  elevates  the  importance  of  philosophical 
questions  about  AI  from  resolving  esoteric  problems  to  making 
decisions  about  social  policy.   The  reader  is  directed  to  an 
eloquent  examination  of  the  philosophical  questions  by  Turing 
(BIBGEN  19  47)   and  an  investigation  of  the  social  impact  of 
AI  by  Firschein  et  al.  (BIBGEN  1973) .   Such  philosophical 
questions  are  far  beyond  the  level  of  this  survey.   However, 
the  psychological  analogy  is  useful  in  introducing  a  unified 
analysis  of  artificial  visual  systems  whose  characteristics 
would  otherwise  appear  quite  diverse.   Since  artificial  visual 
systems  are  often  aimed  at  imitating,  improving  on,  or 
generalizing  from  natural  visual  systems,  the  analogy  is  not 
as  far-fetched  or  poetic  as  it  might  seem. 


*  See  Chapter  4  for  key  to  bibliographic  references  in  the  text. 


For  the  purposes  of  this  paper  I  will  define  scene  analysis 
as  the  computer  processing  of  two-dimensional  projected  images 
of  three-dimensional  scenes,  usually  typical  of  what  humans  see 
in  everyday  life,  to  yield  a  data  structure  which  somehow 
captures  the  individual  identity  of  and  spatial  relations 
between,  objects  in  the  3-D  world.   Scene  analysis  is  then 
computer  visual  perception  and  comprehension.   This  rather 
narrow  definition  deliberately  excludes  holography  which, 
though  it  can  give  an  accurate  reversible  (i.e.  yielding  original 
3-D  surface),   representation  of  three  dimensional  objects, 
treats  the  visual  world  as  a  mathematical  surface  devoid  of 
meaning.   I  also  exclude  picture  processing  which  though  often 
applied  to  enhance  images  of  objects  to  be  treated  by  human 
or  machine  viewers  as  three  dimensional,  (e.g.  stereophotos  in 
cartography)   has  not  until  recently  addressed  itself  directly 
to  the  data  structure  mentioned  in  this  paragraph's  opening 
sentence.   The  two  preceding  areas  are  not  meant  to  be  slighted 
by  omission;  they  are  among  the  most  fruitful  application  areas 
of  computer  processing  of  pictorial  information.   They  are 
certainly  in  a  more  advanced  state  of  application  than  scene 
analysis.   Their  strong  relation  to  scene  analysis  and  the 
area  of  overlap  will  be  discussed.   In  this  survey,  I  restrict 
the  scope  more  to  semantic,  relational  aspects  involved  in 
the  representation  of  objects  rather  than  the  purely  numerical, 
mathematical  surface  defined  by  the  locus  of  surfaces  intersected 
by  the  line  of  sight.   "Semantics"  can  mean  a  number  of  things, 
so  let  me  exclude  also  the  most  common  meaning  in  AI ,  viz., 
semantic  data  structures  such  as  Winograd's  (BIBGEN  1972)  where 
physical  objects  are  represented  devoid  of  their  geometric 
coordinates  in  terms  of  labelled  items  and  their  relations  in 
list  structures  for  the  purpose  of  manipulation  and  inference. 
This  too  is  a  very  important  and  closely  related  area  whose 
connections  with  scene  analysis  will  be  discussed  though  not 
thoroughly  surveyed.   It  is  fraught  with  many  of  the  open-ended 
problems  of  the  general  representation  of  knowledge  in  AI 
independent  of  vision.   We  then  wedge  ourselves  narrowly  between 


applied  mathematics  (physics,  geometry  and  computer  processing 
of  pictures)  and  semantics  (knowledge,  meaning) ,  using  both 
for  support.   I  hope  to  show  that  many  important  advances  in 
scene  analysis  implicitly  embody  the  curious  semantics  of 
projective  geometry,  but  often  expressed  in  forms  specifically 
suited  to  optics  and  digital  computation  rather  than  the 
general  axiomatic  form  used  in  mathematics. 

This  survey  consists  of  four  chapters  and  a  bibliography. 
Chalter  1,  which  you  are  now  reading,  is  a  short  introduction 
to  scene  analysis  and  guide  to  the  rest  of  the  paper.  Chapter  2 
is  the  most  important  one.   It  consists  of  a  history  and  state 
of  the  art  description  of  scene  analysis  techniques.   It  is 
organized  according  to  type  of  approach,  in  roughly  historical 
order.   Chapter  3  is  a  conclusion  which  contains  a  critical 
overview  of  the  techniques  and  a  preview  of  methods  which  might 
be  successfully  applied  in  the  future.   Chapter  4  is  a  guide  to 
the  bibliography.   This  includes  a  description  of  the  kinds  of 
sources  available  and  their  relation  to  research  institutions. 
Chapter  4  should  be  skimmed  immediately  in  order  to  understand 
the  format  of  references  to  the  bibliography  in  the  text  of 
this  survey. 


CHAPTER  2.   SCENE  ANALYSIS  METHODS 

2.1    Introductory  Remarks 

An  important  goal  of  scene  analysis  research  is  to  build 
artificial  eyes  connected  with  control  systems  which  enable  a 
robot  to  manipulate  and/or  maneuver  in  its  environment. 
Applications  include  the  design  of  visual  systems  for  industrial 
robots,  automatic  pilots  for  vehicles,  mechanical  extraterres- 
trial explorers,   and  other  devices.   To  organize  the  discussion 
of  diverse  approaches,  scene  analysis  methods  are  subjectively 
divided  into  several  areas  in  this  chapter.   Two  major  areas 
are  line  analysis  and  region  analysis.   The  former  is  based  on 
classifying  and  relating  to  each  other  the  2-D  images  of  3-D 
edges  and  vertices  of  polyhedra  in  a  scene.   The  latter  is 
based  on  merging  adjacent  picture  regions  with  similar 
properties.   These  and  other  areas  will  be  described  and 
compared.   The  order  of  discussion  will  be  historical  except 
when  that  conflicts  with  organization  by  area. 

Many  of  the  techniques  discussed  appear  superficially  to 
differ  greatly  from  each  other.   A  deeper  analysis  in  term.s 
of  projective  geometry,  however,  often  points  out  implicit 
exploitation  of  similar  phenomena.   This  underlying  connection 
is  not  often  discussed  in  the  literature  but  will  be  brought 
out  to  unify  description  of  techniques  in  this  chapter  and 
suggest  fruitful  areas  for  new  research  in  the  next. 

Scene  analysis  owes  much  to  earlier  work  in  picture 
processing  and  pattern  recognition.   These  were  the  first 
areas  in  which  methods  were  developed  for  representing,  analyz- 
ing,  and  manipulating  pictorial  (i.e.  2-D)  information  by 
computer.   This  essentially  geometric  information  processing 
is  quite  different  from  more  traditional  number  crunching  or 
text  manipulation  computer  applications.   Among  the  differences 
are  that  picture  manipulation  requires  much  more  processible 
memory,  and  the  processes  are  conceptually  two-dimensional 
rather  than  one-dimensional.   The  geometric  problems  and 


techniques  of  pattern  recognition  and  scene  analysis  have  much 
in  coininon  but  also  some  crucial  differences.   In  the  psycho- 
logical model  of  mechanical  vision,  the  aim  of  scene  analysis 
is  to  perceive  and  understand  2-D  images  of  3-D  scenes.   The 
meaning  of  this  analogy  can  be  clarified  using  a  rudimentary 
informational  model;  this  yields  a  natural  hierarchy  from 
physical  measurement  through  pattern  recognition  to  scene 
analysis.   The  nature  of  this  hierarchy  is  examined  in  the 
following  discussion  of  picture  processing  and  pattern  recogni- 
tion.  The  reader  can  find  comprehensive  bibliographies  of 
picture  processing  accompanied  by  clear  descriptions  of  tech- 
niques in  Rosenfeld's  excellent  surveys  (BIBPIC  1969,1972-1975). 
Any  of  the  titles  containing  the  words  "Pattern  Recognition" 
in  BIBPIC  can  introduce  the  reader  to  that  topic. 


2.2   Pattern  Recognition  and  Scene  Analysis 

A  physical  measuring  instrioment  can  be  considered  to  be  a 
finite  state  device,  each  of  whose  states  corresponds  to  a 
range  of  physical  conditions  of  the  object  it  is  measuring. 
Rothstein  (BIBGEN  1956)  has  shown  how  the  amount  of  information 
about  the  physical  object  derived  from  such  a  measurement  can 
be  formally  defined  in  terms  of  thermodynamic  entropy,  thereby 
linking  computational  and  physical  concepts.   Though  the 
measuring  device  is  finite  state,  and  information  consists  of 
reduction  of  uncertainty  about  which  of  these  states  correctly 
describes  the  measured  object,  the  physical  state  space  is 
usually  thought  of  as  a  one-dim.ensional  continuum  with  a  metric. 
That  is,  the  finite  set  of  states  is  well-ordered  and  can  be 
divided  into  sxobsets  with  that  same  property  without  limit 
(until  the  quantum  uncertainty  limit)  and  differences  between 
values  of  state  variables  have  meaningful  (comparable)  magni- 
tudes.  The  psychological  analogue  of  such  information  could 
be  termed  a  sensation  such  as  the  sensation  of  temperature, 
brightness,  or  pressure;  these  quantities  are  usually  considered 
as  belonging  to  one-dimensional  intensity  continua.   The  problem 


of  classifying  such  information  is  usually  trivially  solved  by 
dividing  the  continuum  into  ranges  and  ordering  observations 
relative  to  the  limits  of  these  ranges. 

Pattern  recognition  involves  the  classification  of  patterns 
consisting  of  the  cartesian  products  of  many  (usually)  measure- 
ments into  a  small  (usually)  number  of  sets.   The  large  dimen- 
sionality of  m.easurements  not  only  obliterates  the  simple  ordering 
relation  described  in  the  preceding  paragraph  but  also  yields 
astronomical  numbers  of  distinct  possible  patterns.   A  common 
approach  to  simplifying  classification  is  to  pre-process  the 
information  by  extracting  features,  abstractions  of  simple  prop- 
erties defined  by  relations  between  members  of  subsets  of  the 
pattern  space.   The  number  of  features  is  usually  far  smaller  than 
the  number  of  measurements  and  they  are  chosen  so  that  they  meas- 
ure some  pattern  properties  related  to  the  desired  classification. 
The  cartesian  product  of  a  number  of  features  constitutes  feature 
space;  points  in  this  space  derived  from  patterns  which  are  to  be 
assigned  to  the  same  set  in  pattern  classification  ought  to 
cluster  together  more  than  points  derived  from  differing  patterns. 
By  defining  a  distance  function  between  points  in  feature  space, 
classification  is  achieved  by  assigning  a  new  point  to  the  near- 
est cluster.   The  kinds  of  features  chosen  are  crucial  to  effective 
clustering  and  highly  task  specific.   Many  of  the  tools  of  pattern 
recognition  are  derived  from  sophisticated  statistical  decision 
theory  applied  to  the  assignment  of  points  in  feature  space  to 
clusters.   Tou  and  Gonzalez  (BIBPIC  1974)  contains  detailed 
description  of  these  tools  and  a  good  bibliography  for  other 
sources  in  pattern  recognition.   In  our  psychological  analogy, 
if  measurement  is  associated  with  sensation,  pattern  recognition 
is  associated  with  perception.   Assigning  large  numbers  of  dis- 
tinct patterns  to  a  single  class  corresponds  to  recognizing  a 
form  or  percept  in  any  one  of  a  large  number  of  differing  parti- 
cular presentations. 

In  visual  (also  called  optical  or  pictorial)  pattern  recogni- 
tion, the  patterns  to  be  classified  are  usually  gray-level  (also 
called  grey-scale)  pictures.   These  are  2-D  pictures  which  have 


been  divided  into  cells  by  a  regular  grid.   Each  cell  is  charac- 
terized by  a  single  number  (measurement)  corresponding  to  the 
integrated  intensity  of  light  throughout  that  cell.   Any  picture 
is  then  a  pattern  or  point  in  a  space  with  as  many  dimensions  as 

there  are  cells  in  the  grid.   Hence,  for  an  n  x  n  grid,  the 

2 
pattern  space  has  dimension  n  .   The  earliest  efforts  m  pictor- 
ial pattern  recognition  were  directed  to  the  problem  of  recogniz- 
ing printed  characters  such  as  numerals  or  letters  of  the 
alphabet  by  template  matching,  described  as  follows.   The  gray- 
level  in  each  cell  is  either  1  or  0  depending  on  whether  ink 
(darkness)  is  present  or  absent  in  that  part  of  the  picture.  The 

ideal  example  of  any  character  is  called  a  template  and  is 

2 
represented  by  the  n  component  vector  of  I's  and  O's.  Classifi- 
cation of  other  samples  is  achieved  by  identifying  each  with  the 
ideal  (template)  vector  it  best  matches;  by  "best"  is  meant  that 
for  which  the  largest  number  of  corresponding  components  match  in 
the  two  vectors.   The  method  fails  completely  if  the  sample  to  be 
recognized  is  not  almost  identical  in  size,  position  and  orienta- 
tion to  the  template.   Two-dimensional  transformations  "scramble" 

2 
the  n  -dimensional  vector.   This  limitation  can  be  overcome  by 

2 
exploiting  the  fact  that  the  n  vector  can  be  "unfolded"  into  the 

two-dimensional  picture  space.   In  this  much  more  tractable  space, 
applying  simple  geometric  transformations  can  invert  the  effects 
of  2-D  translation,  rotation,  scale  change  and  other  one-to-one 
transformations  to  yield  a  pattern  in  registration  with  any 
template.   Such  techniques  have  been  applied  to  yield  optical 
character  recognition  (OCR)  devices  capable  of  reading  printed 
text  into  a  computer  automatically. 

When  geometric  transformations  are  unpredictable,  not 
invertible  or  too  complex,  for  example  in  recognizing  handwritten 
characters,  the  methods  described  in  the  preceding  paragraph  are 
inadequate.   In  more  complex  problem  domains,  feature  extraction 
can  be  very  helpful.   Unlike  the  case  of  general  pattern  recog- 
nition where  feature  extraction  operates  over  the  many  dimensional 
space  of  the  cartesian  product  of  measurements,  in  pictorial 


pattern  recognition,  feature  extraction  usually  operates  over  the 
two-dimensional  picture  space.   That  is,  features  in  the  latter 
case  often  represent  simple  geometric  properties.   Some  corres- 
pond to  local  relations  such  as  shape,  orientation,  curvature, 
contrast  and  number  of  lines  at  an  intersection  while  others 
reflect  more  global  relations  such  as  topological  connectivity, 
visual  texture,  repetitiveness,  size  or  alignment  of  parts. 
Spatial  frequency  analysis  including  hologram  analysis  can  be 
regarded  as  the  extraction  of  global  features  related  to  geometric 
symmetries.   Presence  of  such  features  corresponds  to  peaks  in 
the  frequency  domain.   Families  of  orthogonal  square  wave  functions 
such  as  Walsh,  Haar  and  Hadamard  (e.g.  Gerardin  and  Flament, 
BIBPIC  1969)   functions  are  computationally  much  simpler  for 
spatial  analysis  than  the  trigonometric  functions  used  in  Fourier 
analysis  partly  because  inner  products  in  the  former  case  can 
reduce  to  boolean  operations  and  also  because  their  digital  nature 
is  more  appropriate  for  grey-scale  (digitized)  pictures.  However, 
the  Fourier  transform  itself  can  be  cheaply  and  rapidly  effected 
using  optical  holography.   A  useful  characteristic  of  spatial 
frequency  analysis  is  that  geometric  distortions  and  their 
inverses  in  the  picture  domain  often  correspond  to  easily  expressed 
changes  in  the  transform  domain,  greatly  simplifying  the  problem 
of  picture  registration  in  generalizations  of  template  matching. 
Another  useful  characteristic  is  that  the  image  degrading  effects 
of  high  frequency  noise  resulting  from  digitization  of  poor  optics 
can  be  effectively  countered  by  low-pass  filtering  in  the  frequency 
domain.   Also,  spatial  integration  and  derivative  taking  are 
expressed  in  simple  algebraic  terms  in  the  transform  domain.  The 
disadvantages  of  spatial  frequency  analysis  are  computational 
expense  and  inflexibility  in  choice  of  features.   The  technique 
is  useful  only  if  certain  kinds  of  global  geometric  symmetries 
are  suitable  for  distinguishing  pattern  classes.  Then,  special 
purpose  hardware  can  overcome  the  computational  expense  of  general 
purpose  softeware.   Fingerprint  classification  and  visual  texture 
analysis  are  exeunples  of  tasks  where  this  is  true.  One  advantage 
in  extracting  features  of  whatever  type  is  that  pattern  classifica- 


tion  can  often  be  accomplished  by  template  matching  in  the  2-D  trans- 

2 
form  space  more  easily  than  in  the  n  dimensional  picture  space. 

Contrast  features  have  been  particularly  useful  in  a  wide 
variety  of  applciations,  including  scene  analysis.   These  are 
local  features  corresponding  to  small,  simple  neighborhoods 
exhibiting  strong  variation  in  the  gray  levels  of  their  consti- 
tuent cells.   In  simple  pictures  composed  of  relatively  large 
uniformly  light  areas  differing  from  each  other  in  gray  level, 
such  high  contrast  neighborhoods  outline  the  regions.   The 
picture  can  then  be  represented  as  a  collection  of  chains  of  high 
contrast  boundary  curves.   In  a  sense  this  corresponds  to  storing 
only  the  locations  of  non-zero  gray-level  gradients  rather  than 
all  gray  levels.   Since  gradients  are  zero  in  large  uniform 
regions,  far  less  information  is  needed  to  specify  the  picture 
than  the  gray-level  representation,  though  the  latter  can  be 
reconstructed  by  a  computation  analogous  to  integration.   This 
economy  is  vital  because  the  gray-level  representation  often 
requires  more  central  computer  memory  than  is  available  at  one 
time. 

Representing  any  curve,  including  a  boundary,  by  encoding 
differences  between  coordinates  of  successive  cells  along  the 
curve  rather  than  listing  locations  is  called  chain  encoding. 
This  difference  in  coordinates  corresponds  roughly  to  the 
digitized  derivative  or  slope  and  economizes  storage  for  reasons 
analogous  to  those  given  in  the  preceding  paragraph  describing 
the  gray-level  to  gradient  transition.   In  addition,  however, 
chain  encoding  yields  digitized  descriptions  of  geometric  shape, 
independent  of  position,  which  can  be  used  to  compute  such 
measures  as  area,  curvature  and  perimeter  as  well  as  express 
the  changes  in  patterns  resulting  from  geometric  transformations. 
Such  digitial  computations  have  been  developed  by  Freeman  (BIBPIC 
1974) ,  Weiman  and  Rothstein  (BIBTR  1972) ,  and  Rothstein  and 
Weiman  (BIBPIC  1976) .   Applications  range  from  image  processing 
to  computer  graphics.   The  discrete  expression  of  these 
geometric  computations  is  often  completely  different  from  their 
expression  in  continuous  mathematics.   The  processes  of  geometric 


abstraction  outlined  above  play  an  important  role  in  pattern 
recognition  in  part  because  of  the  resulting  computational 
economies.   One  such  economy  is  realized  in  approximating  curves 
by  straight  line  segments.  Since  the  curve  is  completely  deter- 
mined by  its  endpoints,  only  they  need  be  stored.   In  scene 
analysis,  straight   lines  also  play  an  important  role  because 
of  the  projective  geometric  relation  between  the  3-D  scene  and 
its  2-D  image . 

Dividing  a  picture  into  regions  using  any  criteria,  including 
high-contrast  boundaries,  leads  to  an  abstract  description  in 
terms  of  region  adjacency.   That  is,  each  region  can  be  represent- 
ed as  a  node  in  a  graph  and  edges  connect  nodes  representing 
regions  sharing  a  common  border.   Besides  being  informationally 
economical,  the  picture  is  described  in  terms  of  topological 
relations  rather  than  the  exact  positions  and  shapes  of  its  parts. 
Such  a  description  is  invariant  under  a  large  group  of  geometric 
and  other  transformations.   This  approach  has  been  generalized  to 
what  are  called  linguistic  methods  in  pattern  recognition  (see 
Miller  and  Shaw,  BIBPIC  1968  and  Kaneff,  BIBPIC  1970).   In  the 
linguistic  approach  the  relational  description  of  a  picture  is 
more  important  than  simple  classification.   The  relations  may  be 
much  more  general  than  topological  connectivity  but  the  descrip- 
tion is  still  graphical.   The  graph  can  be  modified  by  rules 
from  a  graph  grammar  to  either  parse  or  generate  "legal"  graphs, 
i.e.  those  corresponding  to  valid  pictures.   This  is  a  "more 
intelligent"  process  than  classification  of  pictures  in  the  same 
sense  that  parsing  a  sentence  (word)  in  a  formal  language  is  a 
"more  intelligent"  process  than  simply  recognizing  its  constitu- 
ent words  (symbols) .   It  has  the  important  advantage  over  classi- 
fication that  a  potentially  infinite  class  of  pictures,  i.e. 
including  pictures  never  seen  before,  can  be  recognized  or  des- 
cribed.  This  kind  of  relational  analysis  is  an  important 
charactertistic  of  many  scene  analysis  methods.   In  our  psycho- 
logical model,  it  corresponds  to  comprehension  of  a  whole  as 
more  than  the  sum  of  its  perceived  or  recognized  constituent  parts. 


10 


The  scene  is  comprehended  as  an  organization,  rather  than  merely 
a  collection,  of  objects. 

There  are  phenomena  in  the  process  of  projecting  3-D  scenes 
into  their  2-D  images  which  traditional  2-D  pattern  recognition 
techniques  do  not  address.   Paramount  among  these  is  foreshorten- 
ing, a  distortion  in  which  the  2-D  distance  between  image  points 
depends  on  the  orientation  of  the  line  connecting  the  corres- 
ponding 3-D  points  relative  to  the  line  of  sight  and  also  on  their 
distance  from  the  image  plane.  The  extrem.e  case  of  foreshortening 
occurs  when  the  line  connecting  two  3-D  points  is  a  line  of  sight 
(goes  through  the  point  of  projection) .   For  opaque  objects,  the 
usual  case,  this  yields  occlusion,  a  discontinuity  that  abruptly 
erases  part  of  an  object's  image  that  may  contain  important 
features  for  recognition.   This  phenomenon  causes  hidden  surfaces 
and  on  3-D  rotation  of  an  object  in  a  scene  yields  a  succession 
of  2-D  views  that  cannot  be  continuously  transformed  into  each 
other.   Projection  is  a  many-to-one  function  and  therefore  not 
invertible.   This  loss  of  information  and  the  discontinuities 
in  2-D  images  of  continuously  transformed  3-D  objects  cannot  be 
overcome  by  ordinary  feature  extraction,  2-D  geometric  transfor- 
mations, or  template  matching. 

The  information  loss  in  projection  is  structured  in  the 
sense  that  any  two  3-D  points  that  are  mapped  into  one  image 
point  lie  on  a  straight  line  through  the  point  of  projection, 
i.e.  a  line  of  sight.   If  there  are  two  points  of  projection 
then,  the  intersection  of  two  lines  of  sight  uniquely  determines 
the  position  of  a  point  in  three  dimensions,  a  property  useful 
in  binocular  stereoscopic  vision.   Geometrically  related  to  this 
property  is  the  fact  that  a  line  in  three  dimensional  space 
together  with  the  point  of  projection  determine  a  plane  in 
three  space.   This  plane  intersects  the  image  plane  in  a  straight 
line;  hence  3-D  straight  lines  have  2-D  images  in  the  picture 
plane  that  are  also  straight  lines.   This  fact,  together  with 
the  informational  economy  in  representing  straight  lines 
discussed  in  the  paragraph  on  chain-encoding  earlier,  make  it 
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natural  that  many  of  the  earliest  efforts  of  scene  analysis  were 
directed  to  analyzing  scenes   containing  straight  lines.   Such 
scenes  contained  polyhedra,  3-D  objects  bounded  by  planar  faces 
which  therefore  intersect  along  straight  line  edges.   The  2-D 
images  of  the  faces  are  regions  bounded  by  straight  lines.  In 
simple  lighting  situations  a  planar  face  is  usually  uniformly 
bright,  but  the  different  orientations  of  faces  relative  to  the 
light  source  and  the  observer  yield  image  regions  of  different 
gray  levels.  Thus,  the  images  of  edges  may  be  found  by  using 
contrast  features.   We  shall  briefly  examine  some  of  the  techniques 
used  for  finding  these  straight  lines  in  gray-level  pictures 
before  discussing  the  analysis  of  polyhedral  scenes. 

Identifying  straight  lines  in  gray-level  pictures  was  a 
much  more  difficult  task  than  originally  anticipated  by  researchers 
in  scene  analysis.   It  appeared  that  one  need  only  trace  along 
adjacent  high-contrast  features  in  pictures  of  polyhedra  to  get 
straight  lines,  but  in  reality  such  features  were  often  spuriously 
present  in  noisy  picture  regions  not  near  straight  lines  and 
difficult  to  detect  in  some  places  along  faint  straight  lines. 
Though  smoothing  (or  low  pass  filtering)  reduces  noise,  it  also 
suppresses  contrast.   Special  contrast  feature  extractors  can  be 
designed,  however,  which  paradoxically  enhance  contrast  while 
suppressing  noise;  they  combine  spatial  smoothing  and  the  taking 
of  spatial  derivatives.   Smoothing  a  function  can  be  accomplished 
by  taking  the  convolution  of  the  function  with  a  fixed  smoothing 
function  such  as  the  gaussian  function.   In  the  discrete  version 
of  the  process,  the  convolution  integral  reduces  to  a  weighted 
average  of  values  in  a  local  neighborhood.   The  result  in  either 
version  is  "blurring"  or  smoothing.   Now,  regions  of  high  contrast 
are  characterized  by  extreme  values  of  derivatives.   In  a  picture, 
derivatives  are  defined  as  the  limits  of  ratios  of  intensity  to 
distance  between  points  at  which  the  intensities  are  observed. 


From  now  on,  the  word  "scene"  refers  to  the  3-D  configuration 
whose  image  is  projected  onto  a  2-D  picture. 
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In  the  discrete  version  the  limit  cannot  proceed  below  grid  cell 
size,  which  if  taken  to  equal  unity  reduces  the  derivative  to  a 
kind  of  weighted  finite  difference.   Smoothing  and  derivative 
taking  in  the  continuous  case  can  be  conceptually  unified  by 
considering  their  representations  in  Laplace  transform  theory. 
There,  convolution  in  transform  space   corresponds  to  multipli- 
cation of  the  pattern  transform  by  the  transform  of  the  smoothing 
function.   The  transform  of  the  derivative  of  a  pattern  is  simply 
a  constant  times  the  transform  of  the  pattern.   Since  products 
commute,  the  same  result  is  effected  by  taking  the  derivative  of 
a  smoothed  pattern  as  by  taking  the  convolution  of  the  pattern 
with  the  derivative  of  the  smoothing  function.   In  discrete  form, 
the  latter  coresponds  to  taking  a  weighted  average  of  the 
picture  points  in  a  neighborhood  where  the  weights  may  be  negative 
as  well  as  positive.   Thus,  smoothing  and  derivative  taking  can 
be  accomplished  in  one  step  no  more  complex  computationally  than 
smoothing.   Smoothing  is  analogous  to  a  0th  order  difference,  and 
various  higher  order  derivatives  are  analogous  to  higher  order 
differences.   In  the  2-D  case  partial  derivatives  correspond  to 
finite  differences  or  weighting  functions  over  neighborhoods 
rather  than  intervals.   The  choice  of  diameters  of  these  neigh- 
borhoods is  critical  in  effective  feature  extraction.  Ordinarily 
"noise"  grain  is  much  finer  than  "signal"  (or  true  picture)  grain. 
If  the  diameter  of  averaging  neighbhoods  falls  between  these  two 
grain  sizes,  both  noise  suppression  and  contrast  enhancement  can 
occxir  simultaneously  resulting  in  a  better  signal-to-noise  ratio. 

Almost  all  contrast  feature  detectors  can  be  considered  as 
weighted  averagers  as  described  in  the  preceding  paragraph. 
Examples  are  Shirai's  contrast  detector  (BIBSA  1973)  which 
resembles  a  smoothing  and  first  order  derivative  taken  in  a 
direction  perpendicular  to  a  line.   The  Laplacian  operator  is  a 
second  order  partial  derivative;  Horn  (BIBTR  1972  and  BIBPIC  1974) 
discusses  its  discrete  analog.   Hueckel's  (BIBPIC  1971,  1973) 
operators  are  combinations  of  finite  differences  of  various  orders; 
for  that  reason  they  are  particularly  versatile  in  detecting  and 
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describing  a  broad  range  of  types  of  contrast;  however  they  are  some- 
what more  expensive  computationally  than  simpler  differencing  schemes, 

Smoothing  and  taking  spatial  derivatives  of  a  picture  of  a 
polyhedral  scene  yields  distinctive  ridges  and  valleys  along  the 
images  of  polyhedral  edges  and  yields  flat  regions  where  high 
frequency  noise  or  uniform  intensity  prevail.   The  TRACK  program 
package  developed  at  MIT  and  described  by  Lerman  and  Woodham 
(BIBTR  1973)  reduces  such   a  picture  to  a  line  drawing.  It  uses  a 
Shirai  type  contrast  detector  to  locate  points  of  high  contrast; 
with  these  it  associates  more  distant  points  lying  close  to  the 
best  straight  line  fitting  the  original  points,  progressing  from 
point  to  point.   A  straight  line  is  thus  extended  in  the  direction 
best  fitting  (according  to  some  mean  square  error  criterion)  those 
points  already  assigned  to  it.   When  no  more  feature  points  are 
found  in  that  direction,  the  line  is  terminated.   Thresholds  can 
be  set  to  "tune"  the  program  to  various  contrast  levels  and 
acceptable  line  lengths.   In  the  end,  the  data  structure  that 
TRACK  presents  to  scene  analysis  programs  consists  of  lines  speci- 
fied by  their  endpoints.  Neither  gray-levels  nor  features  remain. 
Using  statistical  mathematical  models  for  noise,  edge  image 
blurring,  and  light  intensity  variations  over  polyhedral  surfaces 
Griffith   (BIBPIC  1971,  1973)  derived  theoretically  the  acceptance 
criteria  for  straight  line  determination  that  are  found  experi- 
mentally (by  "tuning"  thresholds)  in  the  TRACK  program  package. 
Duda  and  Hart  (BIBPIC  1972)  use  a  rather  different  method  for 
recognizing  straight  lines  in  noisy  situations  which  was  applied 
by  O'Gorman  and  Clowes  (BIBPIC  1973).   It  involves  first  finding 
high-contrast  neighborhoods  and  then  scanning  these  in  two  perpen- 
dicular directions  with  feature  extractors  resembling  first  order 
derivatives.  These  two  "directional  derivatives"  are  components 
of  the  local  light  intensity  gradient.   The  direction  of  any  local 
edge  is  perpendicular  to  this  gradient,  and  can  be  considered  as 
a  small  segment  of  a  straight  line.   The  so-called  Hough  Transform 
is  applied  to  each  small  line  segment  (local  edge)  to  represent  it 
as  a  point  in  a  space  whose  two  dimensions  correspond  to  the  angle 
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the  local  edge  made  with  the  x-axis  in  the  picture  space  and  the 
distance  of  the  line  on  which  it  might  lie  from  the  origin.  Any 
extended  line  in  picture  space  will  yield  a  small  dense  cluster 
of  such  points  in  Hough  Transform  space.   The  degree  of  cluster- 
ing is  therefore  used     as  a  criterion  in  deciding  whether  a 
line  exists  in  the  picture;  its  endpoints  can  be  found  by  locating 
extreme  values  of  the  coordinates  of  the  neighborhoods  corres- 
ponding to  the  edges  in  question.   A  crucial  difference  between 
this  method  and  tracking  methods  is  that  orientation  of  a  feature 
and  its  existence  anywhere  on  the  line  locus  are  used  as  evidence 
rather  than  feature  continuity  with  respect  to  tracking  order. 
The  result  is  that  a  line  can  be  recognized  even  if  interrupted 
by  many  large  gaps  or  crossed  by  other  curves;  in  such  situations 
tracking  could  fail.   The  end  result  of  the  Duda-Hart  process  is 
also  a  data  structure  representing  straight  line  segments  whose 
end  points  are  specified. 

If,  in  our  psychological  analogy,  light  intensity  measurement 
corresponds  to  sensation,  then  pattern  recognition,  including 
straight  line  recognition,   could  be  called  perception.  The  next 
level  up  in  the  hierarchy  is  then  comprehension.   In  artificial 
vision  we  shall  here  consider  comprehension  as  understanding 
something  about  the  3-D  relations  between  parts  whose  2-D  images 
have  been  perceived.   The  goal  of  scene  analysis  is  that  kind  of 
comprehension.   We  will  now  examine  some  of  the  first  approaches 
to  recognizing  scenes  consisting  of  polyhedra  based  on  the  line 
drawings  of  their  images. 

2.3   Approaches  to  Scene  Analysis 

2.3.1  Analysis  of  Straight  Line  Representations  of  Polyhedra 

In  the  history  of  scene  analysis  Roberts  (BIBSA  and  BIBTR 
1965)  is  credited  with  taking  the  first  step  in  computer  interpre- 
tation  of  a  2-D  picture  as  a  monocular  view  of  a  3-D  scene. 
His  method  was  an  extension  of  known  picture  processing  and 
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pattern  recognition  techniques.   First,  the  grey-scale  picture  is 
reduced  to  a  line  drawing  by  fitting  lines  to  contrast  features. 
Recognition   is  accomplished  by  geometric  transformation  and 
generalized  template  matching,  but  in  the  3-D  realm  rather  than 
the  2-D  picture  space.   Picture  regions  are  considered  to  be  the 
images  of  polygonal  faces  of  polyhedra.   The  polyhedra  in  the 
3-D  scene  are  restricted  to  be  examples  of  certain  known  models, 
hence  the  2-D  images  of  faces  can  only  be  of  certain  known  types, 
subjected  to  projective  distortions.   The  final  image  is  a  product 
of  rigid  motions  of   the  polyhedra  in  three  dimensions  and  the 
projective  transformation.   In  cartesian  coordinates  the  former 
are  linear  and  the  latter  nonlinear  transformations.  Roberts 
overcame  the  computational  difficulties  in  expressing  this  mixture 
by  using  homogeneous  coordinates  which  permit  a  unified  linear 
representation  of  all  relevant  transformations.  This  notation 
originated  in  projective  geom.etry  (see  Coxeter  BIBGEN  1964)  where 
it  greatly  simplified  and  unified  description  of  phenomena  diffi- 
cult or  impossible  to  describe  in  Euclidean  geom.etry.  The  next 
step  in  Roberts'  recognition  process  is  to  invert  the  transforma- 
tions which  yielded  the  image  to  identify  the  visible  faces  of 
the  3-D  polyhedra.   Recognition  consists  in  m.atching  the  resulting 
3-D  structure  with  a  known  model,  a  generalization  of  template 
matching . 

Roberts'  method  cannot  correctly  label  picture  regions  as 
faces  of  blocks  in  a  3-D  scene  when  that  scene  contains  unknown 
types  of  polyhedra  or  known  types  which  partially  occlude  each 
other.   Guzman  (BIBTR  and  BIBSA  196  8)  tried  to  overcome  these 
weaknesses  as  well  as  avoid  the  computationally  expensive  numeri- 
cal approach  by  using  more  linguistic,  relational  methods.  Given 
the  line  drawing  of  a  scene,  Guzman's  SEE  program  does  not  use 
the  exact  coordinates  of  points  and  lines  as  does  Roberts'  program 
but  rather  the  geometric  relations  between  them.  SEE  concentrates 
on  points  at  which  several  picture  lines  intersect.  These  junctions 
are  the  images  of  polyhedral  vertices,  the  meeting  points  of 
several  faces.   The  acuteness  or  obtuseness  of  the  angles  between 
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these  junction  lines  depends  on  the  viewer's  position  relative  to 
the  planes  of  the  polyhedron.   For  example,  when  looking  at  a 
corner  of  a  cubical  building  from  the  street,  the  two  visible 
roof  edges  and  the  intersection  of  the  walls  at  the  corner  form 
an  image  junction  which  resembles  an  upward  pointing  arrow.  After 
a  vertical  helicopter  ride  to  a  position  above  the  plane  of  the 
roof,  the  junction  resembles  a  two  pronged  fork;  the  angle  between 
the  shaft  of  the  arrow  and  its  barbs  has   changed  from  acute  to 
obtuse  and  the  roof  surface  has  become  visible.   These  and 
several  other  junction  configurations  are  recognized  by  Guzman's 
program.   From  observing  cases,  heuristics  are  derived  for  infer- 
ring whether  or  not  regions  on  either  side  of  a  junction  line 
should  be  considered  as  images  of  adjacent  faces  of  a  polyhedron. 
An  abstract  model  of  the  scene  is  constructed  by  representing  each 
region  as  a  node  in  a  graph  and  face  adjacency  is  denoted  by 
linking  appropriate  nodes.   The  phase  just  described  yields  a 
representation  based  on  information  local  to  junctions.  A  more 
globally  plausible  structure  is  derived  in  the  next  phase  by 
examining  the  graph  and  ruling  out  certain  unlikely  configurations 
such  as  incorrect  connections  between  distinct  polyhedra. 

The  graphical  representation  of  distinct  bodies  by  Guzman 
has  a  linguistic  flavor  and  is  a  higher  level  abstraction  than 
Roberts'  representation  of  examples  of  models.   This  permits 
analysis  of  much  more  complex  scenes  with  many  objects  partially 
occluding  each  other  and  containing  objects  never  seen  before. 
One  key  to  the  power  of  this  approach  is  that  junction  properties 
exploited  by  the  region-linking  heuristics  are  invariant  under  a 
large  class  of  2-D  and  projective  transformations.  Therefore, 
transformation  inversion  which  Roberts  needed,  is  never  necessary. 
Junctions  are  informationally  rich  because  they  capture  at  a 
single  point  the  relations  between  several  extended  parts  (the 
faces)  of  polyhedra. 

Guzman's  program  was  a  first  step  in  junction  analysis,  and 
it  had  several  unforeseen  weaknesses.   After  a  more  careful  analy- 
sis of  scenes  and  their  images,  Rattner  (BIBTR  1970)  embodied  an 
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improved  set  of  heuristics  in  his  program  SEEMORE.   One  of  the 
weaknesses  of  Guzman's  program  was  that  in  scenes  with  many 
objects  close  to  each  other,  shadows  are  thrown  across  faces  and 
their  edges  can  be  misinterpreted  as  the  images  of  polyhedral 
edges.   Orban  (BIBTR  1970)  studied  these  configurations  and 
developed  heuristics  for  detecting  and  removing  such  lines  from 
junctions. 

Waltz  (BIBTR  1972)  attacked  the  same  scene  analysis  domain 
as  Guzman,  Orban,  and  Rattner  but  instead  of  using  a  set  of 
heuristics  based  on  empirical  observations  he  analyzed  geometric 
relations  between  3-D  vertices  and  the  junctions  that  are  their 
projective  images.   His  fundamental  object  of  analysis  was  also 
the  junction  type,  but  instead  of  considering  only  shape,  he 
examined  the  3-D  configurations  that  could  give   rise  to  such 
shapes.   He  then  labelled  junction  lines  with  descriptors  of  the 
3-D  properties  they  could  correspond  to.   These  labels  were 
originated  independently  by  both  Huffman  (BIBSA  1971)  and  Clowes 
(BIBSA  1971)  to  distinguish  between  3-D  edges  which  are  concave, 
convex,  occluding  bordering  surfaces  or  cracks  between  objects. 
Huffman  used  these  four  labels  to  determine  whether  or  not  line 
drawings  could  be  interpreted  as  pictures  of  "real"  objects. 
This  involves  attaching  labels  to  lines  and  progressing  to  other 
lines  through  junctions  until  all  lines  are  labelled.   The 
existence  of  one  kind  of  label  at  a  junction  constrains  the 
label  of  other  lines.   Any  line  can  only  be  assigned  a  single 
label.   If  it  is  impossible  to  label  a  picture  consistent  with 
these  constraints,  it  could  not  be  the  picture  of  a  real  object. 
An  example  is  the  "devil's  pitchfork",  a  picture  which  locally 
looks  "real"  but  is  inconsistent  globally.   Waltz  exhaustively 
examined  all  possible  views  of  vertices  where  three  (and  in  a 
few  cases,  more)  polyhedral  faces  met  and  labelled  the  lines 
accordingly.   For  any  particular  junction  shape,  a  large  number 
of  possible  labellings,  each  corresponding  to  a  distinct  3-D 
configuration,  is  possible. 

Waltz'   program  associates  with  each  junction  in  a  picture 
a  list  of  all  its  possible  labellings .   Those  are  ruled  out  for 
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which  no  adjacent  junction  has  a  labelling  compatible  along  the 
line  they  share.   Waltz  originally  thought  a  tree  search  would  be 
necessary  to  examine  all  consistent  labellings,  with  the  accumula- 
tion of  interacting  constraints   leading  to  a  single  or  few 
possible  complete  picture  labellings.   To  his  surprise,  most 
junction  labellings  were  ruled  out  in  simply  progressing  from 
junction  to  junction.   The  reason  was  that  the  apparent  increase 
in  descriptive  complexity  is  more  than  offset  by  the  fact  that 
the  number  of  geometrically  realizable  labellings  of  any  parti- 
cular junction  shape  is  far  smaller  than  the  number  of  combinations 
possible  if  the  lines  could  be  independently  labelled.   Thus 
compatibility  between  adjacent  junctions  is  much  less  likely  than 
if  lines  were  labelled  independently.   Waltz  next  introduced 
shadow  labellings  for  lines;  exactly  the  same  phenomenon  occurred 
resulting  in  even  better  scene  analysis  with  no  tree  searching 
necessary.   Shadows,  instead  of  interfering  with  scene  analysis 
as  in  previous  approaches,  actually  contribute   information. 
Newborn  (BIBTR  1974)  embodied  Waltz's  picture  labelling  algo- 
rithms in  New  York  University's  high-level  set  theoretic  language 
SETL. 

The  descriptive  labels  on  the  lines  of  an  analyzed  picture 
can  easily  be  used  to  derive  object  identity.   Waltz  goes  beyond 
Guzman,  however,  in  that  even  more  geometric  information  about 
the  3-D  scene  is  found.  Crude  shape  descriptors  for  edges  tell 
something  about  the  relative  positions  of  planes  in  the  3-D 
scene.   A  drawback  is  that  instead  of  a  small  niamber  of  junction 
labels,  a  large  dictionary,  with  thousands  of  entries,  must  be 
stored  or  computed. 

Waltz's  improvement  over  Guzman  is  a  result  of  generalizing 
Guzman's  heuristics  which  were  seen  as  special  cases  of  the 
geometric  semantics  relating  3-D  vertices  and  their  2-D  images 
(junctions).   Mackworth's  program  POLY  (BIBSA  1973)  is  an 
extension  in  this  tradition.   His  approach  is  based  on  the 
following  geometric  representation.   Any  2-D  picture  of  a  3-D 
scene  consists  of  the  projection  of  the  latter 's  visible  points 
through  the  im^age  plane  to  a  single  special  point  called  the 
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viewpoint.   In  a  camera,  for  example,  the   photographic  film 
occupies  the  image  plane  and  the  viewpoint   lies  in  the  lens. 
The  2-D  image  of  a  line  in  the  3-D  scene  is  also  a  line.  This 
image  line  and  the  viewpoint  determine  a  plane  which  must  also 
contain  the  line  in  the  3-D  scene  which  gave  rise  to  the  image. 
This  is  called  the  "plane  of  interpretation"  by  Mackworth;  the 
polygonal  image  of  a  polyhedral  scene  yields  a  bundle  of  planes 
of  interpretation  all  of  which  contain  the  viewpoint.  Now,  if 
the  scene  consists  of  polyhedral  objects,  their  faces  intersect 
along  straight  line  edges  whose  images  determine  the  planes  of 
interpretation.   The  relations  between  the  orientations  of  these 
sets  of  planes  can  be  exploited  to  infer  the  former  from  the 
latter.   The  cumbersome  scene  representation  of  planes  as  sets 
of  points  is  avoided  by  representing  each  plane  as  a  single  point 
in  dual  space.   In  brief,   any  plane  in  three  space  can  be 
characterized  by  the  direction  of  its  normal  and  its  distance 
from  the  origin;  thus  in  dual  space  each  plane  is  represented 
by  a  point  at  the  tip  of  a  vector  starting  at  the  origin  pointing 
in  the  direction  of  the  normal  and  having  magnitude  equal  to  the 
distance  of  the  plane  from  the  origin.   Relations  between  planes 
have  the  following  representations  in  dual  space.   A  set  of 
parallel  planes  corresponds  to  a  set  of  points  lying  on  the  same 
line  through  the  origin.   The  intersection   of  a  number  of 
planes  in  a  single  point  (polyhedral  vertex)  corresponds  to  the 
vertices  of  a  planar  polygon  in  dual  space.   All  planes  of  inter- 
pretation are  represented  as  points  in  dual  space   lying  in  a 
single  plane;  that  plane  is  the  dual  of  the  viewpoint.   The  edges 
bounding  a  single  face  of  a  polyhedron  correspond  in  dual  space 
to  lines  passing  through  a  single  point,  the  dual  of  the  planar 
face. 

Mackworth 's  program  POLY  operates  on  the  dual  space  repre- 
sentation of  scenes  and  their  images  to  determine  edge  types 
(such  as  Huffman  and  Clowes  types) ,  surface  orientation,  body 
identification,  amd  hidden  structure.   It  tries  to  link  neigh- 
boring image  regions  by  labelling  the  lines  separating  them,  as 
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the  images  of  "connect"  edges  in  the  scene,  i.e.  intersections 
of  the  planes  whose  images  are  the  regions.   In  dual  space, 
identifying  a  connect  edge  between  two  planes  corresponds  to 
drawing  the  dual  of  the  edge  (a  line)  through  the  duals  of  the 
planes  (two  points) .   Progressing  from  region  to  region  in  the 
image  corresponds  to  connecting  points  with  straight  lines  in 
dual  space.   Constraints  imposed  by  incidences  in  the  image  have 
their  counterpart  constraints  in  dual  space,  preventing  many 
edges  from  being  interpreted  as  connect  edges.   Since  the  back- 
ground ordinarily  consists  of  a  single  plane,  the  program  begins 
by  attempting  to  connect  it  to  all  regions  bordering  it.  There, 
the  large  number  of  simultaneous  constraints  greatly  reduces  the 
amount  of  backtracking  necessary  in  searching  for  correct  picture 
interpretations.  The  eventual  aim  is  a  picture  in  which  every 
edge  is  labelled  as  connecting  or  occluding  two  neighboring 
regions.   Polyhedral  objects  in  the  scene  correspond  to  poly- 
hedra  in  dual  space;  completing  missing  portions   in  the  latter 
corresponds  to  solving  hidden  line  and  surface  problems  in  the 
former.   Pictures  for  which  this  is  impossible  are  generaliza- 
tions of  Huffman's  illegal  or  "nonsense"  sentences,  for  example, 
the  "devil's  pitchfork"  mentioned  in  the  discussion  of  Waltz' 
appr-^ach. 

Mackworth's  purely  geometric,  projective  approach  appears 
superficially  to  be  a  step  backward  from  the  abstract,  symbolic 
approaches  of  Guzman  and  Waltz  to  the  numerical  methods  of  Roberts, 
That  this  is  not  so  stems  from  Mackwroth's  dual  space  represen- 
tation in  which  points  correspond  not  to  points  in  the  scene  but 
to  sets  of  points  (planes)  bearing  some  relation  to  each  other. 
Roberts '  approach  was  based  essentially  on  transformation  and 
template  matching,  though  with  a  novel  twist;  in  Mackworth's 
approach  object  identification  results  from  interpreting  the 
dual  graph  in  terms  of  connectivity  rather  than  shape  and  posi- 
tion of  picture  parts.   This  extraction  of  high-level  relations 
is  in  the  spirit  of  Guzman  and  Waltz  and  is  one  property  which 
characterizes  artificial  intelligence  approaches  as  opposed  to 
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pattern  recognition  approaches.   In  addition,  however,  the  dual 
space  representation  can   yield  geometric  information  about  the 
scene  inaccessible  to  Guzman  and  Waltz.   Distances  and  angles  in 
dual  space  correspond  to  quantities  describing  geometric  rela- 
tions between  entities  in  the  scene.   Among  these  are  orientations 
of  faces  and  edges;  Waltz's  classification   of  these  was  restrict- 
ed to  binary  distinctions  such  as  convex  vs.  concave.  Waltz's 
dictionary  of  junction  types  is  a  list  of  special  cases  of 
phenomena  Mackworth  can  represent  in  their  entirety.   Waltz's 
surprise  at  the  small  number  of  realizable  parsings  of  sentences 
over  words  in  this  dictionary  would  have  vanished  if  he  had  known 
about  the  general  constraints  of  positioning  points  and  lines  in 
dual  space.   Mackworth 's  dual  space  representation  is  capable  of 
yielding  an  infinite  dictionary , including  all  the  difficult  cases 
of  polyhedral  vertices  formed  by  more  than  three  coincident  planes. 
Though  he  did  not  include  it,  there  is  room  for  representing 
shadows  as  projections  of  edges  through  the  light  source  onto  a 
surface  in  the  scene  and  then  through  the  viewpoint  to  the  image 
plane.   This  new  projection  point  (the  light  source)  ought  to  add 
many  of  the  scene  analysis  capabilities  of  binocular  visual  systems; 
the  dual  space  equivalent  of  this  point  is  a  plane  which  would 
further  constrain  relations  between  dual  space  points. 

The  polyhedral  scene  analysis  programs  just  described  (from 
Roberts  through  Mackworth)  are  characterized  as  "bottom  up". 
That  is,  before  scene  analysis  proper  begins,  small  regions  of 
high  local  contrast  (features)  are  found  and  linked  together  by 
least  squares  or  similar  statistical  methods  into  straight  line 
segments.   These  line  segments  are  the  primitives  on  which  the 
scene  analysis  programs  operate.   A  serious  drawback  of  this 
approach  is  that  feature  detection  is  very  sensitive  to  threshold 
and  appropriate  thresholds  may  differ  in  different  parts  of  the 
picture  depending  on  the  high-level  (semantic,  3-D  geometric) 
structure  of  the  scene.  This  high-level  information  cannot  be 
known  in  advance  of  the  low-level  or  line  finding  phase  of  the 
operation.   In  using  a  uniform  threshold  throughout  the  picture, 
high  sensitivity  yields   spurious  feature  detection  where  noise 
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is  present.   Noise  is  inevitably  high  due  to  camera  imperfections, 
bad  lighting,  uneven  reflective  properties  of  objects  in  the 
scene  and  digitization  truncations  (HORN,  BIBTR  1969) .   Lowering 
sensitivity  of  feature  finders  to  avoid  false  positive  detections 
results  in  failure  to  detect  faint  lines  between  regions  which  are 
the  images  of  nearly  equally  bright  adjacent  faces  of  a  polyhedron. 

Falk's  (BIBSA  1971,1972  and  BIBTR  1970)  approach  to  solving 
the  problem.s  described  in  the  preceding  paragraph  was  to  append  a 
"top-down"  phase  to  the  bottom-up  phase.   Spurious  lines  are 
removed  and  missing  lines  are  added  after  the  initial  bottom-up 
phase  by  considering  the  initial  line  drawing  as  the  projected 
image  of  a  scene  consisting  of  a  collection  of  polyhedra  of  known 
types.   Inverting  the  scene-to-image  projection  in  the  manner  of 
Roberts  and  solving  hidden  line  and  surface  problems  leads  to  a 
best  model  of  the  scene  in   terms  of  the  known  polyhedral  types. 
Missing  or  extra  lines  resulting  from  the  bottom-up  phase  are 
added  or  deleted  respectively  to  make  the  final  line  drawing 
conform  to  the  model  of  the  scene.   Thus,  it  differs  from  Roberts' 
approach  in  that  a  Guzman-like  bottom-up  phase  first  predicts 
the  scene  objects  and  only  after  that  relatively  cheap  computation 
(list  processing  rather  than  matrix  operations)  does  the  projec- 
tive analysis  occur.   The  advantage  is  not  only  greater 
simplicity,  but  the  Guzman-like  phase  of  the  operation  is  far 
superior  in  analyzing  scenes  with  large  numbers  of  objects  which 
occlude  portions  of  each  other.   One  drawback  of  this  method  is 
that  the  scene  is  limited  to  collections  of  objects  from  a  fixed 
set.   The  hidden  line  and  surface  algorithms  are  also  computation- 
ally expensive. 

Shirai  (BIBTR  1972  and  BIBSA  1973)  overccime  many  of  the  draw- 
backs of  the  bottom- up  approach  using  a  method  that  looks  decep- 
tively low- level,  but  succeeds  as  a  result  of  efficient  use  of 
heuristics  based  on  high-level  properties  of  polyhedral  scenes. 
His  overall  strategy  is  to  proceed  from  the  most  distinct  (high- 
est contrast)  edges  to  the  least,  searching  for  lines  only  in 
places  suggested  by  earlier,  stronger  evidence.   Lines  are 
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classified  into  three  categories.   Those  in  the  first,  contour 
lines,  are  characterized  at  the  low   (picture  processing)  level 
as  consisting  of  picture  points  whose  grey  levels  contrast  greatly 
with  local  neighbors.   At  the  high  (3-D  scene  property)  level, 
contour  lines  correspond  to  the  separation  between,  for  example, 
a  light  object  seen  against  a  dark  background.   The  second 
category  consists  of  boundary  lines  which  are  characterized  at 
the  low  level  by  slightly  smaller  contrast  values;  at  the  high 
level  they  correspond  to  the  separation  between  an  object  and 
another  object,  or  between  an  object  and  the  background.  Contour 
lines  are  special  cases  of  boundary  lines.   The  third  category, 
internal  lines,  contains  those  exhibiting  least  contrast;  they 
correspond  to  lines  separating  adjacent  faces  of  a  body. 

Shirai's  process  is  begun  by  sampling  a  grey  scale  picture 
once  in  every  8x8  cell  region.   Samples  whose  grey  levels  differ 
greatly  from  the  picture  average  are  singled  out  as  contour  points; 
full  resolution  is  restored  and  the  same  criterion  is  applied  to 
find  contour  points  missed  by  the  original  sampling  in  the  8  x  8 
neighborhoods  of  those  found  by  the  original  sam.pling.   The  chains 
of  contour  points  resulting  from  this  process  are  traced  and 
broken  into  straight  lines  whose  endpoints   are  local  maxima  of 
chain  curvature.   These  constitute  a  set  of  contour  lines,  strong 
candidates  for  delineating  the  outlines  of  objects -in  the  image 
of  a  scene  because  an  object  and  its  background  usually  differ 
more  from  each  other  in  optical  properties  and  position  relative 
to  the  light  source  than  adjacent  faces  of  the  same  object  or 
nearby  faces  of  neighboring  objects  of  a  similar  type.   This  is 
an  example  of  the  semantics  of  projective  geometry  relating  3-D 
objects  and  their  2-D  images  embodied  in  Shirai's  heuristics. 
Most  of  the  other  nine  heuristics  involve  attempts  to  extend  lines 
from  their  endpoints   or  find  lines  going  in  new  directions 
starting  at  the  endpoints  of  other   lines.   The  semantic  justi- 
fication for  these  heuristics  is  obvious  from  looking  at  the 
jxinction  types  of  Guzman  or  Waltz.   That  is,  lines  in  a  picture 
are  images  of  polyhedral  edges.   Edges  terminate  in  vertices  with 
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other  edges.   Therefore  their  images  are  junctions  of  lines,  the 
logical  place  to  look  for  extensions  of  old  lines  or  beginnings 
of  new  ones.   Since  an  occluded  neighboring  object  often 
contrasts  less  with  its   occluder  than  the   occluder  with  the 
background,  the  extension  of  a  contour  line  along  the  top  of  a 
TEE  junction  is  usually  a  boundary  line.   This  is  embodied  in 
heuristics  by  increasing  the  sensitivity  of  the  line  finder  when 
searching  for  the  extension.   Similarly  motivated  heuristics  are 
used  to  find  all  boundary  lines.   Finally,  the  sensitivity  is 
increased  even  more  to  find  internal  lines  at  likely  places  such 
as  the  stems  of  arrows. 

The  final  product  of  Shirai ' s  system  is  a  line  drawing  of  a 
scene,  with  lines  labelled  as  contours,  boundaries,  and  internal 
lines.   Individual  objects  (polyhedra)  are  identified  by  tracing 
contour  lines  and  their  extensions  into  boundary  lines;  this 
usually  yields  a  simple  closed  path  that  is  the  outline  of  an 
object.   The  selective  use  of  increased  line  finder  sensitivity 
in  places  determined  by  earlier  stronger  evidence  results  not 
only  in  considerable  computational  economy  but  also  in  the  avoid- 
ance of  false  positive  line  detection  in  noisy  regions.   There 
are  two  underlying  principles  behind  Shirai 's  approach.   The 
first  is  careful  exploitation  of  the  projective  semantics  which 
relate  straight  line  edges  in  a  scene  to  their  straight  line 
images .   The  other  is  the  constant  interplay  between  high  level 
knowledge  of  scene  properties  and  low  level  line  following. 
This  interplay  is  an  example  of  heterarchical  rather  than 
hierarchical  control.   The  term  heterarchy  was  used  by  McCulloch 
(BIBGEN  1945)  to  describe  neural  networks  in  which  feedback 
destroys  the  usual  distinction  between  high  and  low  level  control. 
In  analogy  with  linear  systems  theory,  feedback  often  gives 
systems  greater  dynamic  range,  flexibility,  and  computation  power 
than  tiieir  hierarchical  counterparts.   Minsky  (BIBTR  1970) 
suggested  that  heterarchical  control  might  be  valuable  in  AI 
systems.   This  contention  is  strongly  borne  out  by  Shirai 's  system 
which  not  only  often  yields  far  better  line  drawings  with  less 
computation  than  conventional  bottom-up  apprv§©ches  but  also 
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achieves  rather  good  high  level  separation  of  objects  and  identi- 
fication of  edge  types.   Remarkably,  this  higher  level  "semantic" 
type  of  information  is  acquired  without  the  use  of  tree  searches, 
dictionaries  of  junction  types,  or  projective  transformations 
characteristic  of  other  approaches.   These  omissions  prevent 
Shirai's  system  from  separating  neatly  stacked  bodies  and  detecting 
certain  kinds  of  concavity,  i.e.  situations  which  are  not  pointed 
to  by  the  images  of  convex  edges;  the  lack  of  3-D  semantics  is  a 
limitation.   However,  Shirai's  program  could  be  used  as  a  superior 
(to  conventional  line-finders)   preprocessor  for  bottom-up  programs, 
though  such  service  violates  the  spirit  of  heterarchy! 

This  completes  discussion  of  polyhedral  scene  analysis  based 
on  straight  lines  and  their  junctions.   Combinations  of  this 
approach  with  semantic  methods  embodying  knowledge  of  physical 
structure  and  purpose  will  be  discussed  after  describing  other 
methods  which  can  also  be  so  combined.   The  first  of  these  is 
region  analysis. 


2,3.2   Region  Analysis 

Minsky  and  Papert  (BIBGEN  1967)  suggested  avoiding  the  diffi- 
culties of  line  finding  by  constructing  the  regions  that  might  be 
bounded  by  such  lines  directly,  joining  neighboring  cells  which 
have  similar  light  intensities.   Brice  and  Fenema  (BIBSA  19  70) 
incorporated  this  suggestion  in  a  scene  analysis  system.   The 
first  step  is  to  scan  a  picture,  introducing  a  short  boundary 
between  any  two  neighboring  cells  whose  intensity  values  differ 
by  more  than  some  threshold.   The  magnitude  of  the  difference  in 
intensities  is  stored  in  association  with  each  such  boundary. 
In  terms  of  our  earlier  discussion  of  contrast  features,  this 
process  yields  first  order  differences,  analogous  to  first  order 
partial  derivatives  with  respect  to  the  coordinate  axes.   The 
resulting  boundaries  resemble  the  edge-elements  of  0' Gorman  and 
Clowes,  yielding  a  picture  with  many  short-perimeter  regions 
bounded  by  these  edge  elements.  Brice  and  Fenema 's  departure 
from  previous  contrast  detecting  methods  at  this  point  consists 
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in  merging  these  small  regions  according  to  heuristics  based  on 
qualities  of  regions  rather  than  line-fitting.   The  first  of 
these,  the  phagocyte  heuristic,  merges  adjacent  regions,  erasing 
their  common  boundary  if  the  contrast  along  it  is  low  and  the 
ratio  of  its  length  to  the  perimeter  of  the  shorter  region  is 
large.   The  result  is  a  less  choppy  picture  with  many  of  the 
former  small  enclosed  or  almost  enclosed  regions  being  merged 
with  larger  ones.   Next,  a  weakness  heuristic  is  applied, 
merging  regions  if  the  ratio  of  the  length  of  low  contrast 
common  boundary  to  length  of  common  boundary  is  high.   This 
differs  from  the  first  in  operating  without  regard  to  perimeter 
length.   Thus  the  phagocyte  heuristic  cleans  up  small  enclosed 
islands  which  are  unlikely  to  be  real  region  outlines  and  the 
weakness  heuristic  reduces  the  tendency  to  see  spurious  large 
region  outlines  where  intensity  gradients  are  present  but  weak. 

The  next  step  in  Brice  and  Fenema's  scheme  is  to  find 
vertices  (places  where  three  regions  meet)  and  join  adjacent 
ones  along  their  connecting  region-boundary  with  straight  line 
masks.  The  latter  are  thin  rectangles  anchored  at  vertices; 
their  widths  represent  limits  of  acceptable  deviation  of  boundary 
points  from  a  straight  line  joining  vertices.   After  successful 
fitting,  each  mask  is  replaced  by  a  line  and  the  picture  is 
represented  as  a  line  drawing.   Guzman-like  techniques  are 
applied  to  propose   objects,  which  are  assumed  to  be  wedges, 
cubes,  wall  or  floor.   The  last  two  constituents  are  also  sought 
on  the  basis  of  their  vertical  location  in  the  picture.   Line 
drawings  are  corrected  in  a  manner  similar  to  Falk's;  missing 
or  extra  lines  are  added  or  deleted  respectively  on  the  basis 
of  models  of  objects.   This  lack  of  object  generality,  is  not 
the  fault  of  region-growing  but  crudeness  of  higher-level 
processing.   This  is  one  of  the  first  attempts  at  region-growing 
and  predates  the  work  of  Waltz  and  Mackworth,  who  had  more 
sophisticated  high-level  representations. 

In  deriving  the  line  drawings  of  polyhedral  scenes,  region 
growing  and  line-finding  should  yield  the  same  result.  A  picture 
specified  by  a  line  drawing  can  just  as  well  be  specified  by 
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the  regions  those  lines  enclose.   However,  region-growing  ought 
to  be  more  robust  than  line-finding  because  the  heuristics  which 
merge  regions  account  for  global  properties  of  intensity  distri- 
butions throughout  a  region.   This  is  usually  a  much  larger  area 
than  the  small  neighborhoods  used  in  line-finding  so  local  high 
spatial  frequency  noise  is  much  less  disruptive. 

The  usefulness  of  the  heuristics  in  the  region-growing 
method  just  described  is  predicated  on  the  existence  of  rather 
large  regions  with  simple  boundaries,  not  necessarily  straight 
lines.   Line  fitting  was  the  result  of  a  later  pass  designed 
specifically  for  polyhedral  scenes.   An  attractive  feature  of 
region  growing  is  that  it  may  be  applied  to  more  natural  scenes 
whose  images  contain  no  straight  lines.   Regions  may  correspond 
to  irregularly  shaped  areas  with  characteristics  distinct  from 
the  surround.    These  characteristics  need  not  be  restricted 
to  light  intensity  but  may  include  texture,  color  ,  or  binocular 
disparity  between  two  pictures.   The  characteristic  used  need  not 
have  constant  value  throughout  the  region  but  simply  be  free  from 
sharp  discontinuities.   Systems  embodying  general  region  analysis 
will  be  described  in  the  following  paragraphs. 

Krakauer  (BIBTR  1971)  embodied  a  unique  kind  of  region 
analysis  in  a  program  to  distinguish  between  various  fruits  such 
as  apples  and  pears.   This  task  is  representative  of  those  in 
which  humans  easily  recognize  objects  though  examples  within 
a  class  may  differ  in  ways  difficult  to  express  explicitly. 
Recognition  cues  seem  to  include  gross  2-D  image  shape,  3-D  shape 
and  reflective  properties  inferred  from  intensity  gradients  and 
roughness  or  texture.   All  of  these  are  incorporated  in  a  simple 
tree  structure  derived  from  an  intensity  contour  map.   The  tree's 
structure  can  be  visualized  by  considering  an  intensity  contour 
map  as  a  set  of  stacks  of  planar  regions  of  uniform  thickness, 
each  level  corresponding  to  picture  points  with  higher  light 
intensities  than  the  threshold  associated  with  that  level.   As 
examples,  a  uniformly  bright  disk  of  light  corresponds  to  a  stack 
of  poker  chips  and  a  disk  with  progressive  darkening  away  from 
its  center  corresponds   to  a  conical  stack.   Each  distinct 


28 


connected  region  (and  there  may  be  many  in  a  complex  picture)  at 
any  particular  level  corresponds  to  a  distinct  node  in  the  tree 
at  that  level.   If  one  region  supports  a  region  above  (that  is 
if  the  boundary  of  the  former  encloses  the  boundary  of  the  latter) 
a  path  connects  the  corresponding  nodes.   Thus,  the  tree  structure 
describes  set-nesting  relations  of  contour  curves.   A  picture 
of  a  "speckled"  object  corresponds  to  a  "bumpy"  contour  map  with 
many  disconnected  regions;  this  in  turn  corresponds  to  a  tree  with 
many  branches.   Krakauer  distinguishes  between  types  of  such 
texture  by  plotting  number  of  branches  against  tree  level.  This 
profile  has   characteristics  immune  to  geometric  transformation 
of  the  object  being  recognized,  changes  in  illumination,  and 
irrelevant  texture  details.   In  addition  to  the  structure  of  the 
tree,  each  node  can  carry  information  about  the  size  or  shape  of 
the  corresponding  region.   For  example,  pears  can  be  distinguished 
from  apples  by  the  eccentricity  or  elongation  of  regions  at  most 
levels.   The  advantage  of  this  kind  of  measure  is  that  statistics 
can  be  gathered  on  regions  at  fixed  levels  easily  even  if  their 
boundaries  are  not  "clean".    That  is,  ragged  edges,  holes,  and 
sharp  turns  need  not  be  eliminated  by  region  growing  heuristics 
in  order  to  measure  useful  properties  of  regions. 

Barrow  and  Popplestone  (BIBSA  1971)  used  region  analysis  in 
a  somewhat  more  general  manner  than  Krakauer  to  recognize 
pictures  of  household  objects  including  a  teacup,  hammer,  wedge, 
spectacles,  pencil  and  the  like.   Regions  are  found  using  a 
merging  algorithm  in  which  all  cells  in  any  particular  region 
must  have  gray-levels  within  a  narrow  range.  When  ranges  overlap, 
regions  may  overlap.   Instead  of  defining  a  tree  using  set-contain- 
ment and  intensity  thresholds,  Barrow  and  Popplestone  define  a 
graph  which  expresses  more  general  relations  between  regions. 
Each  region  corresponds  to  a  node  and  links  between  nodes 
correspond  to  relations  between  regions  such  as  relative  position 
(above,  beside) ,  relative  size  (bigger) ,  distance,  and  shape  of 
adjoining  boundary.   In  addition,  each  node  carries  descriptive 
information  about  the  corresponding  region  such  as  shape  and 
brightness.   These  cues  were  subjectively  chosen  to  correspond 


29 


to  those  which  might  distinguish  common  objects  in  cartoon-like 
drawings;  such  drawings  resemble  the  result  of  region  growing 
programs   operating  on  gray-level  pictures. 

Recognition  in  the  Barrow  and  Popplestone  approach  consists 
in  matching  the  relational  graph  of  a  picture  with  that  of  the 
model  it  resembles.   This  involves  matching  not  only  graph  shape 
but  also  the  attributes  attached  to  nodes  and  links.   The  model 
comes  from  training  sessions  in  which  samples  of  known  objects 
are  processed  to  gather  statistics  on  the  resulting  graphs.   The 
system  performed  85%  correct  recognition  in  distinguishing  nine 
household  objects,  using  an  average  of  five  minutes  of  computer 
time  per  picture  on  an  ICL  4130. 

Region  analysis  and  other  approaches  can  be  combined  with 
semantic  methods  to  improve  scene  analysis.   These  topics  will  be 
discussed  immediately  after  the  next  section  which  deals  with 
numerical  approaches  exploiting  geometric  semantics. 

2.3.3   Numerical  Region  Analysis  Methods  Using  Geometric  Semantics 

Though  the  region  analysis  methods  just  described  allow  great- 
er flexibility  in  scene  types  and  problem  solving  tools  than  line 
analysis  methods,  the  loss  of  projective  semantics  relating  scenes 
to  their  images  is  a  serious  drawback.   For  example,  a  shadow 
thrown  across  a  region  leads  to  its  erroneous  separation  into 
distinct  regions  in  the  graphical  description  of  the  picture. 
Occlusion  of  objects  by  others  has  similar  undesirable  results. 
For  this  reason  the  m.ethods  of  Krakauer,  and  Barrow  and  Popplestone, 
fail  to  correctly  identify  objects  when  applied  to  scenes  with 
many  objects.   The  ad-hoc  heuristics  on  which  recognition  is 
based  are  not  easily  extended  to  account  for  such  projective 
phenomena.   An  approach  by  Horn  (BIBTR  1970)  analyzes  regions  but 
in  a  strictly  numerical,  geometric  way,  accounting  precisely  for 
the  projective  relations  between  object  and  image.   The  goal  is 
to  infer  3-D  shape  of  smooth  objects  from  gentle  (as  opposed  to 
edges)  gradations  in  light  intensity  (shading)  in  their  images. 
The  underlying  assumption  is  that  the  object's  surface  is 
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optically  uniform.   That  is,  in  approximating  any  small  area  of  a 
surface  by  its  tangent  plane,  the  relative  light  intensity  of  its 
image  is  a  function  of  the  relative  directions  of  its  normal  and 
lines  to  the  light  source  and  observer.   Different  models  of 
surface  optical  properties  lead  to  different  functions.   To 
invert  such  a  function  and  infer  the  3-D  surface  shape  requires 
solving  a  first-order  nonlinear  partial  differential  equation 
which  can  be  reduced  to  a  system  of  five  ordinary  differential 
equations.   Horn's  rigorous,  mathematical  approach  not  only  enables 
him  to  incorporate  heuristics  to  tune  the  parameters  of  numerical 
methods  to  solve  the  system  of  equations  efficiently,  but  also 
leads  to  interesting  observations  on  the  significance  of  shading 
in  human  vision.   The  roles  of  cosmetic  makeup  and  lighting 
techniques  in  photography  are  discussed  in  terms  of  the  mathemati- 
cal model.   Horn's  approach  is  strictly  mathematical;  it  doesn't 
produce  the  abstract,  relational  data  structures  usually  associated 
with  AI.   Just  as  in  Mackworth's  approach,  however,  such  data 
structures  can  be  derived  from  the  inferred  3-D  shape.   For 
example,  the  visual  boundaries  of  any  smooth  convex  object  are 
characterized  by  surface  normals  perpendicular  to  the  line  of 
sight;  thus  the  normals  could  be  used  to  identify  individual 
objects.   Contour  maps  of  3-D  depth  of  objects  should  provide  a 
more  reliable  base  for  analyzing  relations  between  picture  regions 
than  simple  intensity  contour  maps  because  the  former  are  invariant 
under  3-D  transformations  with  suitable  change  in  viewing  position 
alone,  while  the  latter  are  not.   Shadows  cast  by  objects  onto 
others  are  used  to  infer  shapes  and  relative  positions;  they  do 
not  interfere  as  in  the  other  region  analysis  methods. 

High  contrast  loci  (edges  or  region  boundaries  in  earlier 
methods  discussed  earlier)  are  the  terminators  of  Horn's  solution 
paths  rather  than  being  the  objects  of  attention  as  in  line-finding 
methods.   In  Earner's  approach  (BIBTR  1973  and  BIBPIC  1975)  they 
play  the  latter  role,  but  in  scenes  which  can  be  as  general  as 
Horn's.   This  suggests  fruitful  combination  of  the  two  methods 
for  identifying  (isolating)  distinct  objects  in  real  world  scenes 
and  inferring  their  3-D  shapes.   The  capabilities  of  the  two 
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methods  are  complementary.  Ramer  uses  Hueckel's  operator  (BIBPIC 
1971,1973)  to  detect  the  positions,  orientations,  and  strengths 
of  high-contrast  edges  in  a  picture.   He  uses  Freeman's  chain- 
encoding  to  approximate  their  orientations   and  links  short 
chains  into  longer  ones  using  heuristics  based  on  statistical 
confidence  criteria  such  as  strength,  length  of  chain,  local 
signal  to  noise  ratio,  and  directions  of  adjacent  neighbors. 
Long  chains  resulting  from  this  process  are  classified  as  shadow 
edges,  cracks,  texture,  boundaries  of  bright  specular  reflection 
(highlights)  and  object  boundaries  on  the  basis  of  constituent 
edge  properties  and  relations  to  other  chains.   For  example, 
shadow  edges  are  generally  less  sharp  than  boundaries  of  objects 
in  a  scene;  the  chains  corresponding  to  the  former  are  therefore 
characterized  by  lower  contrast  features  and  a  broader  range  of 
local  variation  in  direction  of  constituent  edges.   Resulting 
chains  are  fitted  to  quadratic  curves;  this  is  the  simplest 
computational  extension  of  linear  interpolation.   The  increase  in 
computational  complexity  yields  more  general  boundary  curve  detec- 
tion.  Quadratic  curves  are  particularly  important  because  they 
are  the  images  of  sections  of  quadric  surfaces  (spheres,  cylinders, 
ellipsoids,  cones,  etc.)  which  bound  objects  in  a  more  general 
class  than  polyhedra.   Methods  for  fitting  quadric  surfaces  to 
objects  are  described  by  Agin  (BIBTR  1972)  and  Agin  and  Binford 
(BIBSA  1973).   In  a  sense,  Ramer's  approach  does  for  this  class 
of  objects  what  Shirai's  method  did  for  polyhedra;  namely,  it 
yields  a  classification  of  curved  edges  which  can  be  used  to 
determine  3-D  structure  of  objects  viewed.   The  tools  used  in 
these  two  cases  are  quite  different,  however. 

Another  purely  numerical  method  exploiting  the  projective 
semantics  relating  a  3-D  scene  to  its  2-d  image  is  used  by  Hannah 
(BIBTR  1974).   While  Horn's  method  may  be  characterized  as 
region-growing  based  on  extending  solution  paths  of  partial 
differential  equations,  Hannah's  method  consists  of  region 
growing  based  on  intensity  profile  correlations  between  two 
pictures  of  the  same  scene  taken  from  slightly  different  positions. 
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This  process  is  related  to  binocular  depth  perception.   In  the 
first  phase,  a  small  region  (window)  with  distinctive  statistical 
characteristics  (e.g.  large  gray-level  variance)  is  found  in  one 
picture.   Looking  only  ar  parts  of  the  other  picture  with  the  same 
characteristics,  an  attempt  is  made  to  correlate  the  original 
window  with  a  window  in  the  other  picture.   When  this  succeeds, 
the  two  windows  are  considered  as  being  in  registration  with 
each  other,  i.e.  they  correspond  to  two  views  to  the  same  part 
of  the  scene.    A  region  is  grown  around  this  window  by  moving 
the  windows  in  both  pictures  the  same  distance  and  direction. 
When  an  abrupt  change  in  3-D  depth  occurs,  such  spreading  will 
fail  because  of  binocular  disparity,  a  shift  in  the  images 
relative  to  their  original  registration  positions.   Thus  each 
region  eventually  found  corresponds  to  a  surface  in  space  bounded 
by  abrupt  depth  changes.   Correlation  and  search  for  regions  in 
registration  with  each  other  is  extremely  expensive  computationally. 
Hannah  greatly  reduces  the  niomber  of  computation  steps  necessary 
by  exploiting  statistical  properties  of  the  picture,  using  fast 
algorithms  such  as  the  Fast  Fourier  Transform,  and  restricting 
the  search  to  areas  deemed  likely  by  scene  and  image  geometry. 
The  latter  involves  employment  of  a  Camera  Model,  using  techniques 
developed  by  Sobel  (BIBTR  1970  and  BIBSA  1973,1974) .  This  is 
a  mathematical  model  which  relates  camera  positions  to  image 
properties.   For  example,  an  image  point  in  one  picture  corres- 
ponds to  a  point  in  three  space  which  could  be  located  anywhere 
On  a  line  of  sight  in  three  space.   The  image  of  this  line  of 
sight  in  a  second  picture  is  a  line  rather  than  a  point  if  the 
camera  is  not  in  the  same  position  in  both  cases.   In  searching 
the  second  picture  for  the  corresponding  image  of  a  point  in  the 
first  then,  one  need  only  look  along  this  line  even  if  the 
depth  of  the  point  in  the  scene  is  completely  unknown.   Guesses 
about  its  depth  correspond  to  restricting  the  search  to  segments 
of  this  line.   Thus,  the  fact  that  straight  lines  are  preserved 
as  a  class  under  projective  geometric  transformations  can  be 
exploited  to  great  advantage  even  when  there  are  no  straight 
lines  in  the  3-D  scene  or  its  images. 
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A  property  that  makes  Hannah's  approach  more  powerful  than 
any  discussed  so  far  is  that  her  region  analysis  is  not  impaired 
by  camouflage,  shading,  texture,  sharp  shadows  or  intensity  con- 
trast edges  because  it  is  not  based  on  intensity  levels  and 
their  local  variations  but  on  correlations  between  arbitrarily 
structured  intensity  distributions  in  two  pictures.   She  has 
exploited  a  binocular  geometric  semantics  which  appears  to  be 
much  more  powerful  than  the  monocular  geometric  semantics  of 
other  methods.   We  will  now  examine  some  other  kinds  of  semantics 
used  in  scene  analysis. 

2.3.4    Incorporation  of  General  Semantics  and  Mechanical  Aids 

Feldman  and  Yakimovsky  (see  BIBTR  and  BIBSA,  both  authors, 
either  order  and  several  titles  relating  to  Semantic  Region  Grow- 
ing) improved  the  results  of  region  growing  by  using  merging  rules 
based  on  semantic  or  (real  world)  phenomenon   in  the  scene.   The 
first  step  is  to  conservatively  apply  non- semantic  region  grow- 
ing rules  with  merging  based  on  similarity  of  color   as  well  as 
light  intensity.   The  result  is  a  large  number  of  small  regions. 
Next,   semantic  meanings,   depending  on  the  problem  domain, 
are  attached  to  regions.   For  example,   in  a  typical  view 
through  the  windshield  of  a  car,  blue  regions  at  the  top  of  the 
picture  are  considered  sky;  they  are  merged  with  each  other  and 
while  (cloud)  regions  which  may  intervene.   At  the  bottom  of  a 
picture,  green,  yellow  and  brown  regions  are  interpreted  as 
grass  and  merged  together.   Regions  in  the  middle  of  the  road 
with  distinctive  shapes  are  identified  as  cars;  in  this  case 
merging  proceeds  as  long  as  the  geometric  shape  of  the  regions 
better  approximates  that  of  known  cars.   Sim.ilar  criteria  are 
evaluated  in  merging  regions  in  other  parts  of  the  picture.  The 
criteria  are  based  on  training  sessions  in  which  the  trainer 
evaluates  results  of  certain  merges.   In  the  actual  recognition 
process,  a  Bayesian  strategy  is  followed  to  maximize  the  proba- 
bility of  m.aking  the  best  choice.   The  end  result  is  a 
partitioning  of  the  picture  into  regions,  each  of  which  is 
labelled  according  to  its  meaning  in  the  real  world.   Regions 
are  identified  much  more  realistically  than  in  methods  where 
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semantics  are  not  employed  in  the  region  growing  rules. 

In  the  VISIONS  system,  Hanson  and  Riseman  (BIBTR  1974)  incor- 
porate a  heterarchical  combination  of  high-level  (semantic) 
and  low-level  (geometric  feature  extraction)  methods  using 
parallel  computation.   At  the  semantic  level,  objects  which 
might  be  in  a  scene  are  modelled  by  abstract  relational  struc- 
tures similar  to  those  of  Barrow  and  Popplestone  or  Yakimovsky. 
Low  and  high-level  processes  interact  in  the  following  manner. 
First,  low-level  processors  detect  features  which  suggest  a 
subset  of  objects  which  might  be  present  in  a  scene.   The  corres- 
ponding semantic  models  are  selected;  they  direct  low-level 
processors  to  seek  features  which  would  confirm  or  deny  model 
validity.   In  the  case  of  denial,  new  features  found  could  be 
used  to  suggest  other  models,  and  so  forth.   The  low-level 
feature  detection  capabilities  are  very  flexible  and  include 
detection  of  local  color,  texture,  brightness,  and  contrast. 
The  selection  of  paticular  features  at  the  direction  of  high 
level  models  greatly  improves  efficiency  by  restricting  the 
search  to  relevant  information. 

Winston  (BIBTR  1970)  developed  a  system  which  learns  to 
recognize  complex  high-level  structures  constructed  of  blocks 
(polyhedra) .   Feature  extraction  and  line  detection  are  low- level 
processing  whereas  identifying  polyhedra,  as  in  the  programs  of 
Guzman   and  those  who  followed,  is  considered  high-level  proces- 
sing.  An  even  higher  level  description  of  a  scene  is  one  in 
which  the  blocks  form  structures  such  as  arches,  tables  and  towers. 
Such  structures  are  represented  in  Winston's  system  by  graph  data 
structures  whose  major  nodes  correspond  to  substructures  and  links 
between  nodes  are  relations  such  as  is-supported-by  or 
is-a-part-of .   Additional  nodes  correspond  to  blocks  and  are 
linked  to  other  nodes  describing  their  attributes  (posture,  type 
of    polyhedron) .   Different  kinds  of  structures  have  different 
kinds  of  graphs,  but  many  times  one  type  of  structure  can  have 
variations  yielding  different  graphs.   The  goal  is  to  train  the 
system  to  distinguish  which  of  these  variations   are  crucial  in 
order  to  correctly  identify  structures  consisting  of  arrangements 
of  blocks.   It  is  accomplished  by  presenting  the  learning  program 
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with  "near  miss"  examples,  those  for  which  small  changes  transform 
one  kind  of  structure  into  another.   Thus  instead  of  using  a 
statistical  training  approach  which  would  blur  critical  distinc- 
tions, the  latter  are  singled  out  for  special  attention. 

Winston's  approach  is  closedly  related  to  Minsky's  idea  of 
frames  (BIBTR  1974) ,  a  relational  data  structure  for  representing 
certain  global  situations  or  scenes.   The  frame  is  siibject  to 
transformations  which  are  considered  to  be  m.inor  perturbations  of 
a  single  global  gestalt.   The  nature  of  the  transformations  is 
complex,  however,  involving  alterations  of  the  list  structure 
that  may  be  interdependent  and  difficult  to  anticipate.  The 
semantic  relations  transcend  those  of  the  real  world  in  AI .  This 
very  interesting  and  difficult  topic  lies  at  the  heart  of  much 
current  AI  research  but  is  beyond  the  intended  scope  of  this 
survey;  for  further  discussion  the  reader  is  referred  to  the 
general  AI  texts  in  BIBGEN,  to  Minsky,  Papert  and  Winston  (BIBTR, 
all  references  to  any  of  the  names)   and  in  particular  to  Winston 
(BIBSA  1975) . 

The  aim  of  scene  analysis  is  machine  comprehension  of  real 
world  scenes;  hopefully  under  natural  lighting  conditions.  Some- 
times, however,  an  immediately  useful  application  is  desired,  for 
example  in  robot  assembly  of  parts  in  a  factory,  and  one  must 
fall  back  on  less  sophisticated  methods  for  determining  3-D 
configurations.   These  include  touch  sensors  attached  to  probes 
to  mechanically  measure  distance,  sonar,  and  artificial  lighting 
situations.   The  latter  exploit  highly  structured  lighting  which 
is  usually  used  to  infer  distance  by  triangulation  between  the 
known  position  and  direction  of  the  light  source,  position  and 
direction  of  the  camera,  and  the  image  of  the  light  on  an  object. 
Agin  and  Binford  (BIBSA  1973)  use  a  laser  in  such  a  system  at 
Stanford  University.   Shirai  and  Suwa  (BIBSA  1971)  use  a  slit 
beam  for  range  finding  by  triangulation  in  scenes  with  polyhedra. 
Will  and  Pennington  (BIBSA  1971)  illuminate  a  scene  with  striped 
lighting  and  infer  planar  face  orientation  from  properties  of 
the  2-D  spatial  fourier  transform.   All  of  these  methods  represent 
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a  retreat  from  the  tradition  of  analyzing  natural  scenes  under 
natural  lighting.   In  part,  this  retreat  is  an  admission  of  the 
failure  of  attempts  to  correctly  analyze  any  but  the  most  arti- 
ficial of  scenes.   Successful  scene  analysis,  as  modest  as  it  is, 
is  achieved  only  at  great  computational  expense.   Evidence  for 
the  latter  is  expressed  by  Smith  and  Coles  (BIBGEN  1973)  who  report 
that  analyzing  a  single  complete  scene  requires  ten  minutes  of 
central  processor  time  on  a  large,  high-powered  computer  at 
Stanford  University.   This  is  hopelessly  slow  for  a  practical 
robot  to  skillfully  navigate  a  fork-lift  loader,  let  alone  drive 
a  car.   In  the  next  chapter  possible  ways  of  overcoming  this 
frustrating  limitation  will  be  examined. 
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CHAPTER  3.   OVERVIEW 

The  performance  of  artificial  visual  systems  is  far  below  even 
that  of  rather  primitive  natural  visual  systems.   Though  machine 
recognition  of  objects  could  be  improved  by  combining  several  of 
the  methods  discussed  previously,  the  result  would  be  a  cumber- 
some system  requiring  large  amounts  of  computer  time  and  memory 
space.  The  expense  and  physical  size  of  such  a  system  could  of 
course  be  significantly  reduced  by  the  currently  decreasing  cost 
and  size  of  computer  components.   However,  speed  cannot  be  signi- 
ficantly improved  using  conventional  methods;  without  a  thousand- 
fold speedup,  mobile  robots  which  interact  with  reasonable 
environments  are  impossible.   This  is  precisely  the  kind  of 
impasse  which  haunts  many  other  areas  in  AI ;  methods  successful 
in  toy  worlds  do  not  generalize  effectively  to  real  ones  without 
drastic  reduction  in  performance.   The  combinatoric  explosion 
which  occurs  when  complexity  increases  in  generalizing  to  real 
world  situations  is  often  combatted  by  embodying  semantics  of 
the  real  world  in  heuristics.   The  way  in  which  such  semantics 
are  represented  is  crucial  and  there  are  no  general  rules  for 
choosing  either  the  semantics  or  their  best  representation.  This 
situation  applies  to  the  linguistic  approach  to  scene  analysis 
as  well.   There,  objects,  structures,  and  functional  agcjregates 
of  objects  correspond  to  nodes  in  a  graph  whose  links  correspond 
to  relations  between  these  objects.   As  the  numJDer  of  nodes 
increases  in  generalizing  to  more  complex  situations,  the  numier 
of  potential  links  increases  exponentially.   The  problem  of 
deciding  which  links  are  relevant  becomes  increasingly  difficult 
and,  in  dynamic  situations,  relevance  may  change  in   noncbvious 
ways.   Minsky' s  concept  of  frames  represents  an  attempt  to 
isolate  subparts  of  the  real  world  and  thereby  reduce  linkage 
complexity. 

The  problems  discussed  above  are  general  to  AI  and  it  appears, 
regrettably,  that  many  artificially  intelligent  systems  cannot 
be  significantly  generalized  from  their  toy  worlds.   Chandrasekaran 
and  Reeker  (BIBGEN  1972)  and  Dreyfus  (BIBGEN  1972)  discuss 
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some  apparent  limitations  in  AI  research.   However,  in  scene 
analysis  it  appears  that  breakthroughs  can  be  realized  by  proper 
exploitation  of  geometric  semantics .   Since  the  phenomena  of 
projective  transformations  are  completely  expressible  mathemati- 
cally, their  semantics  are  much  more  tractable  than  those  of 
other  AI  areas.   Any  mathematical  system,  however,  can  be 
expressed  in  a  large  number  of  different  ways  equivalent  formally 
but  enormously  different  in  terms  of  the  complexity  of  carrying 
out  certain  kinds  of  computations.   Roberts,  Mackworth,  Horn  and 
Hannah  explicitly  used  projective  semantics  but  in  completely 
different  representations,  and  for  different  purposes.   Is  there 
a  better,  mere  universal  representation?  Biological  examples 
may  help  answer  that  question.   Examination  of  biological 
processes  has  not  been  profitable  in  most  areas  of  AI  because 
very  little  is  known  about  neurophysiological  correlates  of 
conceptual  phenomena,  e.g.  words,  thoughts  or  logic.  In  vision 
however,  much  is  known  about  the  neurophysiology  and  its  low 
level  computational  capabilities.  Computational  geometry  is  a 
term,  used  by  Minsky  and  Papert  (BIBSA  1969)  to  describe  the 
logical  mechanisms  which  are  used  to  deduce  geometric  properties 
in  quantized  spaces.   They  discussed  computations  for  topological 
geometry.   The  computational  geometry  of  some  affine  transforma- 
tions is  discussed  in  Weiman  and  Rothstein  (BIBTR  1972) .   Lateral 
inhibition  is  a  good  example  of  biologically  observed  geometric 
computation.  This  is  the  inhibition  of  activity  of  neighboring 
nerve  cells  by  an  active  cell  in  a  network.   It  is  a  phenomenon 
common  to  many  sensory  nerve  networks;  in  vision  it  provides  a 
mechanism  for  smoothing  and  taking  spatial  derivatives  just  like 
the  finite  differencing  operators  mentioned  in  the  discussion  of 
contrast  features.   Marr  and  Pettigrew  (BIBTR  1973)  and  Marr 
(BIBTR  197  4)  have  examined  more  subtle  geometric  computation 
capabilities  of  neurophysiological  structures  in  the  visual 
system. 

Some  of  the  problems  of  inferring  3-D  scenes  from  2-D  pictures 
involve  the  solution  of  partial  differential  equations  (PDE's) 
relating  geometric  quantities.   In  Horn's  approach  the  differential 

39 


quantities  were  components  of  tangents  to  surfaces  and  gradations 
in  image  intensity.   Though  Hannah's  method  does  not  use  PDE's 
explicitly,  the  binocular  disparities  used  as  a  basis  for  region 
analysis  are  angle  differentials  which  are  related  to  depth 
differentials.   In  fact,  the  relations  between  those  quantities 
have  been  expressed  in  terms  of  differential  geometry  by  Luneburg 
(BIBGEN  1947)  to  yield  some  rather  remarkable  and  counter-intuitive 
conclusions  about  our  binocular  perception  of  visual  space. 
A  closely  related  topic  that  is  virtually  unexplored  but  probably 
fruitful  is  projective  differential  geometry  (Wilczynski  BIBGEN 
1905) .   A  computational  advantage  of  representing  pictures  in 
discrete  space  (for  example  a  retina  or  a  digitized  picture)  is 
that  derivatives  of  various  orders  correspond  to  finite  differences 
and  the  operations  of  calculus  are  easily  approximated  by  simple 
arithmetic  operations  as  we  saw  in  the  discussion  of  contrast 
detectors.   This  fact  is  exploited  in  numerical  analysis  using 
grids  for  the  numerical  solution  of  PDE's  where  the  analytic 
solution  is  not  known  or  intractable.  These  numerical  solutions 
are  usually  solved  sequentially,  one  cell  at  a  time.   In  the 
biological  case,  irany  nerve  cells  are  active  simultaneously. 
This  kind  of  parallel  processing  gives  biological  organisms  the 
capability  of  reacting  much  faster  than  artificial  visual 
systems  despite  the  fact  that  biological  com.ponents  are  much 
slower.   Parallel  computation  cannot  only  increase  speed  but 
often  completely  changes  the  expression  of  algorithms  (Traub, 
BIBGEN  1973).   In  many  cases,  complex  sequential  algorithms 
can  be  expressed  in  terms  of  a  large  number  of  simple,  cooperat- 
ing, simultaneously  active  (parallel)  algoritlims.   This  is 
particularly  relevant  to  the  solution  of  PDE's;  algebraically 
complicated  global  functions  are  often  the  solutions  of  simple 
differential  (local)  equations.   Recently  formal  models  of 
biological  developmental  processes  have  been  generalized  and 
studied  in  the  theory  of  formal  languages.   Called  L-systems 
after  the  biologist  Lindenmayer  who  originated  them,  these 
systems  are   important  to  the  theory  of  parallel  computation 
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as  well  as  biological  models  (Herman  and  Rozenberg  BIBGEN  197  5) . 
Horn  (BIBPIC  1974)  has  recently  incorporated  parallel,  discrete 
algorithms  in  a  system  for  inferring  illiomination  of  surfaces 
from  image  intensities.   Since  image  intensity  is  a  function  of 
surface  orientation  and  illumination,  this  problem  is  closely 
related  to  his  Ph.D.  thesis  topic  discussed  earlier.  Remarkably, 
his  approach  explains  perceptual  phenomena  previously  poorly 
modelled  and  points  to  actual  neurophysiological  components  of 
the  mammalian  retina  which  could  be  carrying  out  the  parallel 
algorithms. 

The  parallel,  numerical,  PDE  approaches  just  discussed  in  a 
sense  deal  with  what  could  be  called  lower  level  data  in  vision. 
That  is,  they  show  how  3-D  depth  contours  can  be  inferred  but  do 
not  deal  directly  with  the  higher  concepts  of  shape  recognition, 
isolation  of  distinct  bodies,  and  determination  of  relations 
between  objects.  Global  abstractions  are  considered  in  gestalt 
psychology  as  being  the  result  of  associating  parts  on  the  basis 
of  something  they  have  in  common  (Haber  and  Hershenson,  BIBGEN 
1973)  .  This  includes  grouping  parts  that  are  geometrically  close 
to  each  other  and  grouping  together  parts  that  are  similar  to 
each  other.   Lester  (BIBTR  1974  and  1975)  has  quantified  the 
concepts  of  proximity  and  similarity  in  programs  which  group 
together  very  general  kinds  of  genetric  objects.   Though  he  only 
considered  2-D  pictures,  application  of  this  kind  of  idea  to 
inferred  3-D  structures  could  be  useful.   Hannah's  approach 
implicitly  joins  regions  on  the  basis  of  lateral  and  depth 
proximity  while  other  region  growing  methods  use  a  combination  of 
similarity  and  proxim.ity.   If  one  generalizes  "common  fate"  gestalt 
grouping  concepts  to  include  "common  transformation"  the  role 
of  motion  in  vision  becomes  very  important.   This  is  an  area 
which  has  been  virtually  ignored  in  artificial  vision  which 
usually  concentrates  on  analyzing  form  in  static  pictures  and 
regards  motion  as  an  interpolation  between  two  static  pictures. 
This  ignores  completely  the  fact  that  motion  detectors  are 
present  in  large  numbers  in  all  mammallian  visual  nervous 
systems  and  that  when  images  are  stabilized  on  the  human  retina, 
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vision  ceases.   The  roles  usually  ascribed  to  motion  detectors 
are  providing  feedback  for  eye  movements  and  attracting  the  eye 
to  parts  of  a  scene  which  may  move.  Though  there  is  certainly 
great  adaptive  value  in  these  roles,  motion  detectors  could  also 
play  a  fundamental  part  in  the  perception  of  geometric  form  and 
inferring  3-D  shape.   Piatt  (BIBGEN  1960)  has  shown  how  null 
responses  from  motion  detectors  when  a  pattern  is  moved  on  a 
retina  can  be  used  to  recognize  straight  lines  and  circles  as  a 
result  of  self-congruence  properties  of  such  figures  under  certain 
transformations.   These  particular  figures  are  of  central  impor- 
tance in  projective  geometry.   Generalizing  this  motion  to  three 
dimensions,  Johannsen  (BIBGEN  1975)  presents  strong  experimental 
evidence  that  human  assumption  of  3-D  self-congruence  of  objects 
in  a  scene  plays  a  powerful  role  in  interpreting  motion  visible 
only  on  a  2-D  display.   This  characteristic  is  exploited  in  the 
kinetic  depth  effect  in  computer  graphics.   When  a  2-D  image  is 
transformed  as  it  would  be  if  the  rigid  3-D  object  it  represents 
were  rotated  in  space,  the  image  immediately  "looks"  3-D. 
(Newman  and  Sproull  BIBGEN  1973)  .   Human  perception  of  moving 
objects  does  not  appear  to  be  a  succession  of  static  frames;  in 
real  life  stagecoach  wheels  do  not  appear  to  rotate  backwards 
due  to  frame  strobing  as  in  motion  pictures. 

In  summary,  pti^rceiving  motion  directly  could  provide  a  method 
for  separating  the  effects  of  projective  transformations  and  3-D 
motions.   Once  these  have  been  isolated,  scene  organization  can 
be  based  on  "common  3-D  motion"  as  a  gestalt  grouping  mechanism. 
For  example,  a  still  picture  of  a  flock  of  brown  birds  flying 
together  between  the  branches  of  a  large  brown  tree  would  be  very 
difficult  to  analyze  purely  by  the  methods  of  either  region  analy- 
sis, edge  analysis  or  binocular  disparity  analysis.   There  are 
too  many  significant  small  regions  and  sharp  edges  to  analyze, 
and  many  weak  edges  separate  tree  branches  from  birds .  A  motion 
detecting  method,  however,  would  group  together  all  edges  sharing 
the  same  speed  and  direction.   This  idea  is  very  much  in  keeping 
with  Gibson's  (BIBGEN  1966)  who  proposes  that  sensory  systems 
measure  invariance  of  certain  stimulus  properties  under  transfor- 
mations rather  than  properties  in  isolation.  Mechanisms  for 
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detecting  such  invariances  could  be  incorporated  in  integral 
geometry  schemes.   These  are  generalizations  of  the  Buff on  needle 
problem  from  statistics  and  probability  and  have  the  advantage 
that  geometric  properties  can  be  measured  to  any  desired  accuracy 
independent  of  coordinate  systems  using  statistical  inference 
(Novikoff  BIBGEN  196  2) .   A  pool  of  edge  detectors  would  be  an 
ideal  candidate   for    embodying  such  mechanisms.  Combining 
the  ideas  of  this  paragraph  with  the  discussion  of  PDE ' s  in 
analyzing  static  scenes,  motion  is  the  derivative  of  position. 
Contrast  feature  detectors  linked  by  delays  can  be  used  as  motion 
detectors.   Relative  motion  of  eye  and  image  reduce  motion  detec- 
tion and  edge  detection  to  the  same  kind  of  computation.  Now,  the 
systems  of  PDE ' s  for  static  picture  analysis  are  augmented  by 
equations  in  which  derivatives  with  respect  to  time  are  involved. 
This  added  constraint  and  the  integral  relation  between  motion 
and  distance  ought  to  be  explored  for  possible  embodiment  in 
artificial  visual  systems. 

The  fact  that  the  new  approaches  suggested  in  the  preceding 
paragraphs  do  not  strongly  resemble  traditional  approaches  is  not 
meant  to  discredit  the  latter.   All  of  them  have  elements  which 
are  important  stepping  stones   incorporated  into  the  new  sugges- 
tions.  Linguistic,  traditionally  AI  methods   have  not  been 
greatly  discussed  because  I  feel  there  are  many  geometric 
semantic  problems  which  can  be  solved  to  put  us  far  ahead  of 
where  we  are  now;  at  that  time  the  conceptual,  semantic  structures 
such  as  Minsky's  and  Winston's  will  be  even  more  important  than 
they  are  now.   Vision  and  intelligence  can  be  treated  separately, 
though  intelligent  vision  is  more   powerful  than  vision  alone. 
For  example,  the  lowly  housefly  has  excellent  vision  which  is 
used  to  navigate  at  high  speed,  avoiding  collision  in  environ- 
ments containing  far  more  complex  objects  and  lighting  extremes 
than  those  current  robot  vision  systems  maneuver  through.  It  is 
doubtful  that  high  level  concepts  and  abstractions  are  involved. 
It  appears  quite  feasible  to  build  a  system  with  similar  capabili- 
ties. If  in  addition  we  incorporate  knowledge  of  real  world  objects 
and  relations  into  such  a  system,  so  much  the  better,  but  the 
latter  is  not  necessary  for  excellent  machine  vision. 
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CHAPTER  4.   DESCRIPTION  OF  BIBLIOGRAPHY 

The  bibliography  is  divided  into  four  parts  whose  names  are 
also  the  names  of  indirect  access  permanent  files  on  the  CDC  6600 
at  the  Courant  Institute.   The  first  three  letters  of  the   names 
of  these  files  are  always  BIB;  remaining  letters  are  acronyms  for 
the  type  of  contents  of  each  file.   The  format  of  these  files  is 
such  that  copies  onto  Hollerith  cards  yield  symbols  compatible 
with  most  computer  systems,  permitting  portability.   Each  biblio- 
graphic reference  is  formatted  for  easy  visual  perusal  and 
elementary  information  retrieval  or  updating  using  string 
processing  languages  or  standard  text  editors.   To  simplify  these 
tasks,  all  references  are  in  a  standard  form,  that  encompasses  the 
wide  diversity  of  publication   types  which  range   from  disserta- 
tions to  textbooks.   In  this  standard  form,  each  reference 
occupies  three  lines.   The  first  three  or  four  colxomjis  of  each 
line  are  reserved  for  keys,  handy  links  for  information  retrieval 
or  quick  visual  reference.   The  first  line  of  each  entry  contains 
the  author's  name(s),  last  name  first  followed  by  initials  followed 
by  the  date  of  publication  in  parentheses  (see  BIBGIDE  file  which 
follows  this  section  for  other  format  details) .      References  in  the 
text  are  given  by  author (s)  follov/ed  by  the  file  name   and  date. 

The  division  of  the  bibliography  into  four  files  is  based  on 
document  accessibility  and  topic.   The  file  BIBSA  contains  refer- 
ences closely  related  to  scene  analysis  (hence  SA  in  its  acronym) 
and  publically  available  in  journals  (key  JP)  books  and  conference 
proceedings.   The  key  JP  stands  for  journal  paper,  in  which  case 
the  third  line  (key  S  for  £Ource)   gives  the  name  of  the  journal, 
voliame  number,  and  pages  in  that  order.   The  journals  Artificial 
Intelligence   (American  Elsevier,  New  York  or  North-Holland, 
Amsterdam)   and  Computer  Graphics  and  Image  Processing   (Academic 
Press)  contain  the  bulk  of  scene  analysis  literature  in  this  form. 
Another  important  source  are   in  the  set  of  Proceedings  of  the 
International  Joint  Conferences  on  Artificial  Intelligence  refer- 
red to  as  PIJCAI  in  the   bibliography  and  described  at  the  bottom 
of  the  file  BIBSA.   Also  important  is  the  Machine  Intelligence 
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series,  proceedings  of  international  workshops  in  AI  held  in 
Edinburgh,  Scotland.   Information  about  this  collection  is  also 
listed  at  the  bottom  of  the  file;  references  to  it  in  the  biblio- 
graphy are  abbreviated  as  MI  in  the  third  or  source  line.  Often 
papers  in  both  these  collections  are  reworked  and  published  in 
the  journal  Artificial  Intelligence  as  can  be  deduced  by  looking 
at  authors  and  titles;  this  redundancy  is  useful  if  one  of  the 
sources  is  unavailable.   Finally,  these  collections  also  contain 
descriptions  of  robots  which  incorporate  scene  analysis  and 
other  methods  in  problem  solving  systems. 

Two  books  (key  BK  in  the  second  line  of  a  reference)  should  be 
singled  out  for  special  attention  in  BIBSA.   The  first,  a  textbook 
by  Duda  and  Hart  (BIBSA  1973)  gives  comprehensive  coverage  of 
the  areas  in  its  title,  excellent  overviews,  and  good  biblio- 
graphies.  The  second  is  Winston  (BIBSA  1975)  which  could  be 
subtitled  "Scene  Analysis  at  MIT."   The  coverage  is  not  compre- 
hensive over  the  field  of  scene  analysis  but  contains  versions 
of  Minsky's  paper  on  the  concept  of  Frames,  and  Horn's  and  Waltz's 
dissertations;  previously  these  were  only  available  as  technical 
reports. 

The  file  BIBPIC  contains  a  small  selection   of  references  in 
picture  processing  methods  which  were  precursors  of  or  are 
connected  with  scene  analysis.   Most  important  are  the  surveys 
by  Rosenfeld  which  lead  to  further  references  in  picture  proces- 
sing in  a  comprehensive  and  clearly  presented  way. 

BIBGEN  contains  general  references  in  mathematics,  biology, 
and  AI  related  to  the  discussion  in  Chapters  1  and  3  primarily. 
The  Handbook  of  Sensory  Physiology,  Volume  VII  (the  first  entry 
in  BIBGEN)  is  the  most  complete  source  on  the  neurophysiology 
of  vision  and  can  lead  the  reader  to  further  references  in  that 
area.   Most  of  the  titles  of  other  references  in  BIBGEN  are  self- 
explanatory. 

BIBTR  lists  technical  reports,  dissertations  and  memos  (keys 
TR,  DI  and  ME,  respectively,  on  the  title  line)  and  similar 
documents  related  to  scene  analysis  but  not  easily  accessible. 
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The  source  line  (key  S)  lists  the  institution,  department,  degree 
if  dissertation,  and  report  number  if  any,  in  that  order.  Most 
of  the  work  listed  in  BIBTR  was  carried  out  at  the  AI  labs  at  MIT 
or  Stanford  University,  California.  These   two  institutions  have 
strong  professional  interconnections  and  in  the   past  have 
received  considerable  federal  funding  for  AI  research.   Hence 
dissertations  usually  become  technical  reports,  the  most  accessi- 
ble type  of  document  in  BIBTR.   Even  if  the  technical  reports 
are  unavailable,  the  dissertation  titles  give  the  reader  an  idea 
of  the  major  research  interests  of  their  authors;  seeing  the 
names  later  in  journal  articles  can  lead  to  a  good  guess  about 
the  topic  of  an  article.   In  addition,  revised  versions  of 
papers  listed  in  BIBTR  often  find  their  way  into  those  listed 
in  BIBSA;  thus,  to  find  a  more    publically  accessible  form  of 
a  paper  one  need  only  match  authors  and  seek  similar  titles  in 
BIBSA. 

The  least  accessible  documents  in  BIBTR  are  memos,  papers 
intended  only  for  internal  circulation  within  the  source  insti- 
tution.  Those  of  the  artificial  vision  research  group  at  MIT 
are  often  called  Vision  Flashes  (abbreviated  VF  in  the  biblio- 
graphy) ;  other  MIT  memos  in  AI  are  often  called  Working  Papers 
(abbreviated  WP  in  the  bibliography) . 

For  information  on  obtaining  the  bibliography  files  as  a  deck 
of  Hollerith  cards,  write  to  Malcolm  Harrison,  the  principal 
investigator  of  the  sponsoring  grant,  at  the  address  on  the  title 
page  of  this  report.   In  the  listing  of  the  deck  which  follows 
each  single  quotation  mark  (')  comes  out  as  a  not-equal  sign  (5^)  ; 
the  former  was  removed  from  the  printer  chain  at  the  Courant 
Institute  to  make  room  for  special  symbols.   The  card  code  is  a 
4-8  punch  which  is  interpreted  as  a  single  quotation  mark  by 
most  systems. 
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BIB6IDE:  GUIDE  TO  FILES  FOR  SCENE  ANALYSIS  SURVEY  BIBLIOGRAPHY. 


BIBGEN   -  BIBLIOGRAPHY  OF  GENERAL 
NEUROPHYSIOLOGY  RELATED 


TOPICS  SUCH  AS  MATH  AND 
TO  THIS  SURVEY 


BIBPIC   -  BIBLIOGRAPHY  OF  PICTURE  PROCESSING  AND  EDGE  DETECTION 
BIBSA 


BIBTR 


-  BIBLIOGRAPHY  OF  EXPLICIT  SCENE  ANALYSIS  PAPERS  AND 
BOOKS*  PUBLICALLY  MARKETED  (JOURNALS  AND  PUBLISHERS) 

-  BIBLIOGRAPHY  OF  TECHNICAL  REPORTS  AND  SIMILARLY  HARD 
TO  GET  DOCUMENTS  CLOSELY  RELATED  TO  SCENE  ANALYSIS 


1)  IN  ANY  FILE  EACH  REFERENCE  CONSISTS  OF  THREE  LINES;  THE  FIRST 
ONE  OR  TWO  COLUMNS  OF  EACH  LINE  HOLD  A  KEY  WHICH  REFERS  TO  THE  KIND 

OF  INFORMATION  CONTAINED  ON  THAT  LINE.   THE  FIRST  LINE  ALWAYS  CONTAINS 
THE  AUTHORj'S  NAME{S)f  LAST  NAME  FIRST  STARTING  IN  COLUMN  FOUR* 
FOLLOWED  BY  INITIALS  AND  THEN  THE  YEAR  OF  PUBLICATION  IN  PARENTHESES; 
THE  KEY  TO  THE  FIRST  LINE  IS  *kt    (FOR  AUTHOR)  IN  COLUMN  1. 

2)  THE  SECOND  LINE  IS  ALWAYS  THE  TITLE  ENCLOSED  IN  SINGLE  QUOTES 
STARTING  IN  COLUMN  SIX,  BUT  THE  TwO  LETTER  KEY  IN  COLUMNS  GNE  AND 
TWO  GIVE  THE  TYPE  OF  PUBLICATION  AS  FOLLOWS: 

BK     FOR  BOOK* 

CF     FOR  CONFERENCE  PROCEEDINGS  (UNLESS  PUBLISHED  AS  A  BOOK 
WHICH  IS  PUBLICALLY  MARKETED) 

DI  FOR  A  DISSERTATION  OR  THESIS 

IN  FOR  A  PAPER  IN  SOME  OTHER  WORK 

JP  FOR  JOURNAL  PAPER 

ME  FOR  MEMO 

TR  FOR  TECHNICAL  REPORT 

3)  THE  THIRD  LINE  GIVES  THE  PUBLISHING  SOURCE  (KEY  S  IN  COLUMN 
ONE)  OF  A  WORK  STARTING  IN  COLUMN  SIX.   EXAMPLES  OF  THIS  ARE 

THE  PUBLISHER  FOR  A  BOOK,  THE  NAME  OF  A  JOURNAL,  THE  INSTIIUTION 
FOR  A  TECHNICAL  REPORT  OR  THESIS,  THE  AUTHOR ( EDITOR )  AND  TITLE  FOR 
A  COLLECTION  A  PAPER  APPEARS  IN. 

WHERE  CONTINUATION  LINES  ARE  NECESSARY,  THEY  APPEAR  WITHOUT  KEYS. 
A  BLANK  LINE  FOLLOWS  EACH  REFERENCE.   SEE  MODELS  BELOW: 

A   LASTNAMl, I.J.,  LASTNAM2, K . L .  AND  LASTNAM3, M.N.  (YEAR) 

JP    >«TITLE  OF  PAPER!* 

S     JOURNAL  TITLE,  VOL  3,  PAGES  23<»-3<»5. 

A   LASTNAM,I.J.  (YEAR) 

TR    J^TITLE  OF  TECHNICAL  REPORT?* 

S     INSTITUTION,  DEPARTMENT,  LOCATION,  TECH  REPORT  NUMBER. 
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BIBGENJ  GENERAL  REFERENCES  IN  MATH,  BIOLOGY,  NEUROPHYSIOLOGY, 
ARTIFICIAL  INTELLIGENCE,  PSYCHOLOGY,  AND  PHILOSOPHY. 

^^^^4i^^^iLlf*  ******************************  *********************** 

A   ASSORTED  EDITORS  (1972  AND  LATER  FOk  VARIOUS  PARTS) 

BK    JtHANDBODK  OF  SENSORY  PHYSIOLOGY^,  VOLUME  VII,  VISION 

S     SPRINGER-VERLAG,  BERLIN  ANu  NEW  YORK. 

A   BUSEMANN,  A.  AND  KELLY,  P.J.  (i9i>3) 

BK    ^PROJECTIVE  GEOMETRY  AND  PROJECTIVE  METRICS?* 

S     ACADEMIC  PRESS,  NEW  YORK. 

A   CHANDRASEKARAN,  B.  AND  REEKER,  L.H.  (1972) 

TR    ^ARTIFICIAL  INTELLIGENCE-  A  CASE  FOR  AGNCST IC I SM** 

S     OHIO  STATE  UNIV,  COMP  AND  INFO  SCI  DEPT,  C I  Si^C-TR-72-9 . 

A   COXETER,  H.S.M.  (196^.) 

BK    ^PROJECTIVE  GEOMETRY** 

S     BLAISDELL  PUB  CO,  NEW  YORK. 

A   DEUTSCH,  S.  (1966) 

JP    »<CONJECTURES  ON  MAMMALIAN  NEURON  NETWORKS  FOR  VISUAL 

PATTERN  RECOGNITION** 
S     TRANS  IEEE,  SSC2,  DECEMBER,  PAGE  81. 

A   DODWELL,  P.C.  (1970) 

BK    »«VISUAL  PATTERN  RECOGN  I  T  ION>« 

S     HOLT,  RINEHART,  AND  WINSTON,  NEw  YORK. 

A   DREYFUS,  H.L.  (1972) 

BK    #WHAT  COMPUTERS  CAN»«T  DO  -  A  CRITIOUE  OF  ARTIFICIAL  REASON** 

S     HARPER  AND  ROW,  NEW  YORK. 

A   FIRSCHEIN,  0.,  FISCHLER,  M.A.,  COLES,  L.S.  AND  TENENBAUM,  J.M.  (1973) 
IN    ^FORECASTING  AND  ASSESSING  THE  IMPACT  OF  AI  ON  SOCIETY;* 
S     PIJCAI  3,  STANFORD,  PAGES  105-120. 

A   GIBSON,  J.J.  (1966) 

BK    >*THE  SENSES  CONSIDERED  AS  PERCEPTUAL  SYSTEMS^ 

S     HOUGHTON-MIFFLIN,  BOSTON. 

A   HABER,  R.N.  AND  HERSHENSON,  M.  (1973) 
BK    »«THE  PSYCHOLOGY  OF  VISUAL  PERCEPTION** 
S     HOLT,  RINEHART,  AND  WINSTON,  NEW  YORK. 

A   HERMAN,  G.T.  AND  ROZENBERG,  G.  (197b) 
BK    »*DEVELOPMENTAL  SYSTEMS  AND  LANGUAGES^ 
S     NORTH-HOLLAND,  AMSTERDAM. 
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A   HEWITT*  C.   (1971) 

IN    ^PROCEDURAL  EMBEDDING  OF  KNOWLEDGE  IN  PLANNER?' 

S     PIJCAl  2,  LONDON*  PAGES  Ibl-lQ^i 


A   JOHANNSON*  G.  (1975) 

JP    J^VISUAL  MOTION  AND  PERCEPTIQN/ 

S     SCIENTIFIC  AMERICAN,  VOL  232*  JUNE* 


PAGES  76-88. 


A   JULESZ*  B.  (1971) 

BK    J^FOUNDATIONS  OF  CYCLOPEAN  PERCEPTION** 

S     UNIVERSITY  OF  CHICAGO  PRESS*  CHICAGO*  ILL. 


A   LUNEBURG*  R.K. 
BK    ^^MATHEMATICAL 
S     DARTMOUTH  EYE 


(19^7) 

ANALYSIS  OF  BINOCULAR  VISIONS 

INSTITUTE*  HANOVER*  NEw  HAMPSHIRE. 


A   MCCARTHY*  J.  AND  HAYES*  P.J.  (1969) 

IN    »<SOME  PHILOSOPHICAL  PROBLEMS  FROM  THE  STANDPOINT  OF  Ai^ 

S     MI  4*  PAGES  463-502. 


A   MCCULLOCH*  W.S.  (1945) 

JP    >«A  HETERARCHY  OF  VALUES  DETERMINED  BY  THE  TOPOLOGY  OF  NERVOUS 

S     BULLETIN  OF  MATHEMATICAL  BIOPHYSICS,  VOL  7*  PAGES  89-43. 


N£TS>» 


A   MCCULLOCH*  W.S.  (1965) 

BK    ^^EMBODIMENTS  OF  MIND)< 

S     MIT  PRESS*  CAMBRIDGE*MASS. 

A   MINSKY*  M.  AND  PAPERT*  S.  (1967) 
BK    J'PROJECT  MAC  PROGRESS  REPORT  IVj^ 
S     MIT  PRESS*  CAMBRIDGE*  MASS. 


A  NEWMAN*  W.M.  AND  SPRDULL*  R.F.  (1973) 
BK  >«PRINCIPLES  OF  INTERACTIVE  COMPUTER 
S     MCGRAW-HILL*  NEW  YORK. 


GRAPHICS?* 


A  NILSSON*N.J.     (1971) 

BK  jtPROBLEM    SOLVING    METHODS    IN    AU 

S  MCGRAW-HILL*     NEW    YORK. 

A  NOVIKOFF*  A.  (1962) 

IN  »«INTEGRAL  GEOMETRY  AS  A  TOOL  IN  PATTERN  PERCEPTIONS 

S  VON  FOERSTER  AND  ZOPF  (EDS)*  ^PRINCIPLES  OF  SELF-ORGANIZATIONS 
PERGAMON  PRESS*  MACMILLAN*  NEW  YORK*  PAGES  347-368. 


A   PLATT*  J.R.   (1960) 

JP    SHOW  WE  SEE  STRAIGHT  LINESs 

S     SCIENTIFIC  AMERICAN*  VOL  202* 


JUNE*  PAGES  121-129. 


A   RATTLIFF*  F.  (1965) 

BK    SMACH  BANDS:  QUANTITATIVE  STUDIES  ON  NEURAL  NETWORKS 

S     HOLDEN-DAY*  SAN  FRANCISCO. 


IN  THE  RETINAS 
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A      REISS*     R.F.     (EDITOR)     (196^) 

8K         J'NEURAL    THEORY    AND    MODELHNG»« 

S  STANFORD    UNIV    PRESS»    STANFORD,    CALIF. 

A   RQTHSTEIN*  J.  (1956) 

BK    ^INFORMATION,  LOGIC  AND  PHYSICSi* 

S     PHILOSOPHY  OF  SCIENCE,  VOL  23,  PAGES  31-3:>. 

A   SMITH,  M.H.  AND  COLES,  L.S.  (1973) 

IN    ><DESIGN  OF  A  LOW  COST,  GENERAL  PURPOSE  ROtJOT** 

S     PIJCAI  3,  STANFORD,  PAGES  32^-335. 

A      SPRINGER,    C.E.     (196^) 

BK         »»GEOMETRY    AND    ANALYSIS    OF    PROJECTIVE    SPACES;* 

S  W.    H.    FREEMAN,     SAN    FRANCISCO. 

A   TRAUB,  J.F.  (1973) 

BK    *«COMPLEXITY  OF  SEQUENTIAL  AND  PARALLEL  NUMERICAL  ALGORITHMS^ 

S      ACADEMIC  PRESS,  NEW  YORK. 

A   TURING,  A.M.  (1947) 

IN    ^INTELLIGENT  MACHINERY^ 

S     MI  5,    1970,  PAGES  3-23. 

A   VON  SENDEN,  M.  (1960) 

BK    »«SPACE  AND  SIGHT»< 

S     THE  FREE  PRESS,  GLENCOE,  ILLINOIS. 

A   WILCZYNSKI,  E.J.  (1905) 

BK    »«PROJECTIVE  DIFFERENTIAL  GEOMETRY  OF  CUKVhS  AND  KULED  SURFACES^ 

S     CHELSEA,  NEW  YORK,  (1961  REPRINT  GF  1905  TEU8NER  PUB L iC AT  1  ON ) . 

A   WINOGRAD,  T.  (1972) 

BK    ^UNDERSTANDING  NATURAL  LANGUAGES 

S     ACADEMIC  PRESS,  NEW  YORK. 

♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦»*♦♦♦*♦♦*♦♦*♦♦♦*♦*♦♦♦♦♦♦♦♦♦»♦♦**♦♦♦»♦♦♦****»♦**♦♦♦♦ 
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BIBPICs  BIBLIOGRAPHY  OF  PAPERS,  BOOKS,  AND  CONFERENCES  ON 

PICTURE  PROCESSING,  PATTERN  RECOGNITION  AND  EDGE  DETECTION. 

A   DAVIS,  L.  S.  (1975) 

JP    J'A  SURVEY  OF  EDGE  DETECTION  TECHNIQUES?' 

S     COMPUTER  GRAPHICS  AND  IMAGE  PROCESSING,  VOL  <»,  PAGES  2<»8-270. 

A   DUDA,R.O.  AND  HART, P . E ., (1972 ) 

JP    »«USE  OF  THE  HOUGH  TRANSFORM  TO  DETECT  LINES  AND  CURVES  IN 

PICTURES^ 
S     CACM,  VOL  15,  PAGES  11-15. 

A   FREEMAN,  H.  (197^) 

JP    J^COMPUTER  PROCESSING  OF  LINE  DRAWING  IMAGES** 

S     ACM  COMPUTING  SURVEYS,  VOL  6,  PAGES  57-97. 

A   GERARDIN,  L.A.  AND  FLAMENT,  J.  (1969) 

IN    ^^GEOMETRICAL  PATTERN  FEATURE  EXTRACTION  BY  PROJECTION  ON 

HAAR  ORTHONORMAL  BASIS'* 
S     PIJCAI  1,  WASH  DC,  PAGES  65-78. 

A   GRASSELLI,  A.  (ED)   (1969) 

BK    J*AUT0MATIC  INTERPRETATION  AND  CLASSIFICATION  OF  IMAGES** 

S     ACADEMIC  PRESS,  NEW  YORK. 

A   GRIFFITH,  A.  K.  (1971) 

IN    ^^MATHEMATICAL  MODELS  FOR  AUTOMATIC  LINE  DETECTION** 

S     PIJCAI  2,  LONDON,  PAGES  17-26. 

A   GRIFFITH,  A.  K.  (1973) 

JP    »*MATHEMATICAL  MODELS  FOR  AUTOMATIC  LINE  DETECTION?* 

S     JACM,  VOL  20,  PAGES  62-80. 

A   HORN,  B.K.P.  (1974) 

JP    J*DETERMINING  LIGHTNESS  FROM  AN  IMAGE?* 

S     COMPUTER  GRAPHICS  AND  IMAGE  PROCESSING,  VOL  3,  PAGES  277-299. 

A   HUECKEL,  M.H.  (1971) 

JP    ?*AN  OPERATOR  WHICH  LOCATES  EDGES  IN  DIGITIZED  PICTURES?* 

S     JACM,  VOL  18,  PAGES  113-125. 

A   HUECKEL,  M.H.   (1973) 

JP    ?*A  LOCAL  VISUAL  OPERATOR  WHICH  RECOGNIZES  EDGES  AND  LINES?* 

S     JACM,  VOL  20,  PAGES  63A-647. 

A   KANEFF,  S.  (ED)   (1970) 

BK    ?*PICTURE  LANGUAGE  MACHINES?* 

S     ACADEMIC  PRESS,  NEW  YORK. 

A   LEVINE,  M.D.,  0?'HANDLEY,  D.A.  AND  YAGI,  G.M.  (1973) 

JP    ?*COMPUTER  DETERMINATION  OF  DEPTH  MAPS?* 

S     COMPUTER  GRAPHICS  AND  IMAGE  PROCESSING,  VOL  2,  PAGES  131-150. 
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A   LIPKIN/  B.S.  AND  ROSENFELO*  A.  (EDS)  (1970) 
BK    »«PICTURE  PROCESSING  AND  P  S  YCHQP  IC  TOR  IC  S»* 
S     ACADEMIC  PRESS*  NEW  YORK. 

A   Q>«GORMAN  F.  AND  CLOWES*  M.b.  (1973) 

IN    ^FINDING  EDGES  THROUGH  COLLINEARITY  OF  FEATURE  POINTS»« 

S     PIJCAI  3,  STANFORD*  PAGES  &^3-555-. 

A   MILLER*  W.  AND  SHAW*  A.  (196d) 

IN    ^LINGUISTIC  METHODS  IN  PICTURE  PROCESSING  -  A  SURVEY)* 

S     AFIPS  PROC  FALL  JOINT  COMP  CONF,  VOL  33.  PAGE  279. 

A   PINGLE*  K.K.  AND  TENENBAUM,  J.M.  (1971) 
IN    >«AN  ACCOMODATING  EDGE  FOLLOWERS 
S     PIJCAI  2»  LONDON,  PAGES  1-7. 

A   QUAM*  L.H.  (1971) 

01    ^COMPUTER  COMPARISON  OF  PICTURES?* 

S     STANFORD  UNIV*  COMPUTER  SCI  DtPT*  PH  D,  AIM-1<.<.. 

A   RAMER*  U.E.  (1975) 

JP    ^EXTRACTION  OF  LINE  STRUCTURES  FROM  PHOTOGRAPHS  OF  CURVED  OBJECTS*' 

S     COMPUTER  GRAPHICS  AND  IMAGE  PROC  E  SS ING,  VOL  <..  PAGES  81-103. 

A   REED,  S.K.  (1973) 

BK    ><PSYCHOLOGICAL  PROCESSES  IN  PATTERN  RECOGNI T  ICNj« 

S     ACADEMIC  PRESS,  NEW  YORK. 

A   ROSENFELD,  A.  (1969) 

BK    ^PICTURE  PROCESSING  BY  COMPUTER»« 

S     ACADEMIC  PRESS,  NEW  YORK. 

A   ROSENFELD,  A.   (1969) 

JP    »«PICTURE  PROCESSING  BY  COMPUTER»« 

S     ACM  COMPUTING  SURVEYS,  VOL  1,  PAGES  147-176. 

A   ROSENFELD,  A.  (1972) 

JP    ^PICTURE  PROCESSING:  1972j« 

S     COMPUTER  GRAPHICS  AND  IMAGE  PROCESSING,  VOL  1,  PAGES  394-410. 

A   ROSENFELD,  A.  (1973) 

JP    »<PROGRESS  IN  PICTURE  PROCESSING:  1969-1971J' 

S     ACM  COMPUTING  SURVEYS,  VOL  5,  PAGES  81-108. 

A   ROSENFELD,  A.  (1974) 

JP    jtPICTURE  PROCESSING:  1973»' 

S     COMPUTER  GRAPHICS  AND  IMAGE  PROCESSING,  VOL  3,  PAGES  178-194. 

A   ROSENFELD,  A.  (197S) 

JP    »«PICTURE  PROCESSING:  1974>« 

S     COMPUTER  GRAPHICS  AND  IMAGE  PROCESSING,  VOL  4,  PAGES  133-155. 
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A   ROTHSTEIN,  J.  AND  WEIMAN*  C.F.R.  (1976) 

JP    ?<PARALLEL  AND  SEQUENTIAL  SPECIFICATION  OF  A  CONTEXT  SENSITIVE 

LANGUAGE  FOR  STRAIGHT  LINES  ON  GRIDS?* 
S     COMPUTER  GRAPHICS  AND  IMAGE  PROCESSING  (TO  APPEAR). 

A   SMITH,  M.W.  AND  DAVIS,  W.A.  (1975) 

JP    *k    NEW  ALGORITHM  FOR  EDGE  OETECTIONi* 

S     COMPUTER  GRAPHICS  AND  IMAGE  PROCESSING,  VOL  4,  PAGES  55-62. 

A   STAMOPOULOS,  CD.  (1975) 

JP    ^PARALLEL  IMAGE  PROCESSING** 

S     IEEE  TC,  VOL  C-2<»,  PAGES  'iZ'^-'ill, 

A   TOU,  J.T.  AND  GONZALEZ,  R.C.  (197^) 
BK    »«PATTERN  RECOGNITION  PRINCIPLES^ 
S     ADDISON-WESLEY,  NEW  YORK. 

A   TOUSSAINT,  G.T.  (1975) 

JP    ^SUBJECTIVE  CLUSTERING  AND  BIBLIOGRAPHY  CF  BOOKS  ON 

PATTERN  RECOGNITIONj* 
S     INFORMATION  SCIENCES,  VOL  6,  PAGES  251-257. 

A   UHR,  L.  (1973) 

BK    jtPATTERN  RECOGNITION,  LEARNING,  AND  THOUGHT?* 

S     PRENTICE-HALL,  NEW  YORK. 

A      YOUNG,    J.    F.     (1973) 

BK         J<ROBOTICS»» 

S  BUTTERWORTHS,    LONDON 
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BIBSAJ   BIBLIOGRAPHY  OF  BOOKS*  JOURNAL  ARTICLES  AND  SIMILAR 
PUBLICALLY  MARKETED  PUBLICATIONS  CONTAINING  EXPLICIT 
SCENE  ANALYSIS  MATERIAL. 

4^4************************************************4******** 

A       AGIN»    G.J.    AND    BlNFQRD*     T.O.     (1973) 

IN         i«COMPUTER    DESCRIPTION    OF    CURVED    OBJECTS)* 

S  PIJCAI    3,     STANFORD*    PAGES    629-6^0. 

A       BARROW*    C.R.    AND    PQPPLESTONE*     R.J.     (1971) 

IN         »»RELATIONAL    DESCRIPTIONS    IN    PICTURE    PROCESSING** 

S  MI    6*     PAGES    377-396. 

A   BRICE*  C.  R.  AND  FENEMA,  C  L.  (1970) 

JP    KSCENE  ANALYSIS  USING  REGIONS^ 

S     ARTIFICIAL  INTELLIGENCE*  VOL  1*  PAGES  205-226. 

A   CLOWES*M.B.,  (1971) 

JP    #0N  SEEING  THINGS** 

S     ARTIFICIAL  INTELLIGENCE*  VOL  2,  PAGES  79-116. 

A   DUDA*  R.O.  AND  HART*  P.E.  (1973) 

BK    ^PATTERN  CLASSIFICATION  AND  SCENE  ANALYSIS?* 

S     WILEY*  NEW  YORK. 

A   FAHLMAN*  S.E.»(1974) 

JP    XA  PLANNING  SYSTEM  FOR  ROBOT  CONSTRUCTION  TASKS* 
S     ARTIFICIAL  INTELLIGENCE*  VOL  5*  PAGES  1-49. 

A   FALK*  G.  (1971) 

IN    i*SCENE  ANALYSIS  BASED  ON  IMPERFECT  EDGE  DATA»* 

S     PIJCAI  2*  LONDON*  PAGES  6-16. 

A   FALK*  G.  (1972) 

JP    ^^INTERPRETATION  OF  IMPERFECT  LINE  DATA  AS  A 

THREE  DIMENSIONAL  SCENE** 
S     ARTIFICIAL  INTELLIGENCE*  VOL  3*  PAGES  101-14<». 

A   FELDMAN*  J. A.*  FELDMAN*  G.M.*  FALK*  G.*  GRAPE*  G.*  PEAKLMAN*  J.* 

SOBEL*  I.*  AND  TENENBAUM*  J.  M.  (1973) 
IN    fTHE  STANFORD  HAND-EYE  PROJECT** 
S     PIJCAI  1,  WASH  DC*  PAGES  521-526. 

A   FELDMAN*  J. A.  AND  YAKIMQVSKY*  Y.  (197'») 

JP    **DECISION  THEORY  AND  ARTIFICIAL  INTELLIGENCE:  I.  A  SEMANTICS-BASED 

REGION  ANALYSER** 

S     ARTIFICIAL  INTELLIGENCE,  VOL  5*  PAGES  349-371. 

A   GUZMAN*  A.  (1968) 

IN    >*DECOMPOSITION  OF  A  VISUAL  SCENE  INTO  THREE  DIMENSIONAL  BODIES** 

S     PROG  OF  THE  FJCC*  VOL  33*  PAGES  291-304. 


54 


A   HUFFMAN,  D.A.  (1971) 

IN    /IMPOSSIBLE  OBJECTS  AS  NONSENSE  SENTENCES** 

S     MI  6,  PAGES  295-323. 

A   MACKWORTH,  A.K.   (1973) 

JP    itlNTERPRETING  PICTURES  OF  POLYHEDRAL  SCENES/ 

S     ARTIFICIAL  INTELLIGENCE,  VOL  ^,  PAGES  121-137. 

A   MACKWORTH,  A.K.  (1973) 

IN    /INTERPRETING  PICTURES  OF  POLYHEDRAL  SCENES/ 

S     PIJCAI  3,  STANFORD,  PAGES  556-563. 

A   MINSKY,  M.  AND  PAPERT,  S.  (1969) 

BK    /PERCEPTRONS,  AN  INTRODUCTION  TO  COMPUTATIONAL  GEOMETRY/ 

S     MIT  PRESS,  CAMBRIDGE,  MASS. 

A   NEVATIA,  R.  AND  BINFORD,  T.  (1973) 

IN    /STRUCTURED  DESCRIPTIONS  OF  COMPLEX  OBJECTS/ 

S     PIJCAI  3,  STANFORD,  PAGES  641-647. 

A   NILSSON,  N.J.  (1969) 

IN    /A  MOBILE  AUTOMATON:  AN  APPLICATION  OF  Al  TECHNIQUES/ 

S     PIJCAI  1,  WASH  DC,  PAGES  509-520. 

A   ROBERTS,  L.G.  (1965) 

IN    /MACHINE  PERCEPTION  OF  THREE-DIMENSIONAL  OBJECTS/ 

S     TIPPET,  J.T.  ET  AL  (EDS)  /OPTICAL  AND  ELECTRO-OPTICAL  INFORMATION 
PROCESSING/,  MIT  PRESS,  PAGES  159-197. 

A   SHIRAI,  Y.  (1973) 

JP    /A  CONTEXT-SENSITIVE  LINE  FINDER  FOR  RECOGNITION  OF  POLYHEDRA/ 

S     ARTIFICIAL  INTELLIGENCE,  VOL  4,  PAGES  95-120. 

A   SHIRAI,  Y.  AND  SABURO,  T.  (1971) 

IN    /EXTRACTION  OF  THE  LINE  DRAWINGS  OF  3-D  OBJECTS  BY  SEQUENTIAL 

ILLUMINATION/ 
S     PIJCAI  2,  LONDON,  PAGES  71-87. 

A   SHIRAI,  Y.  AND  SUWA,  M.  (1971) 

IN    /RECOGNITION  OF  POLYHEDRONS  WITH  A  RANGE  FINDER/ 

S     PIJCAI  2,  LONDON,  PAGES  80-87. 

A   SOBEL,  I.  (1973) 

IN    /ON  CALIBRATING  COMPUTER  CONTROLLED  CAMERAS  FOR  PERCEIVING 

3-D  SCENES/ 
S     PIJCAI  3,  STANFORD,  PAGES  648-652. 

A   SOBEL,  I.   (1974) 

JP    /ON  CALIBRATING  COMPUTER  CONTROLLED  CAMERAS  FOR  PERCEIVING  3-D 

SCENES/ 
S     ARTIFICIAL  INTELLIGENCE,  VOL  5,  PAGES  185-198. 

A   TOMITA,  F.  AND  MASAHIKO,  Y.  (1973) 

IN    /DETECTION  OF  HOMOGENEOUS  REGIONS  BY  STRUCTURAL  ANALYSIS/ 

S     PIJCAI  3,  STANFORD,  PAGES  564-571. 
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A   WILL*  P.M.  AND  PENNINGTON*  K.S.  (1971) 

JP    »«GRID  CODINGt  A  PREPROCESSING  TECHNIQUE  FOR  ROBOT  AND  MACHINE 

VISIONjt 
S     ARTIFICIAL  INTELLIGENCE*  VOL  2*  PAGES  319-329. 

A   WILL*  P.M.  AND  PENNINGTON*  K.S.  (1971) 

IN    >«GRID  CODINGt  A  PREPROCESSING  TECHNIQUE  FOR  ROBOT  AND  MACHINE 

VISIONS 
S     PIJCAI  2*  LONDON*  PAGES  66-70. 

A   WINSTON*  P.  H.  (1972) 
IN    I'THE  M.I.T.  ROBOTi* 
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